Friday, September 18, 2009

Java Read/Write Files in UTF-8

Even after several years of Java experience, there still can be confusion reagarding handling of files with different encodings.

One thing that is unclear most of the times is that encoding only makes sense if you are trying to read/write character data. For this Java created the Readers and Writers. They only handle textual content and they bridge byte streams to character streams by encoding/decoding the bytes using a specified encoding (char set). For binaries (or for textual content not actually represented as text) one can use normal input/output byte streams.

Reading textual content from a file using UTF-8 encoding:
public static String getContents(File aFile) {
StringBuffer result = new StringBuffer();
BufferedReader br = null;

try {
int len = 0;
char[] buffer = new char[8192];
br = new BufferedReader(new InputStreamReader(
new FileInputStream(aFile), "UTF-8"));
while ((len = br.read(buffer)) != -1) {
result.append(buffer, 0, len);
}
} catch (FileNotFoundException fnfe) {
fnfe.printStackTrace();
} catch (IOException ioe) {
ioe.printStackTrace();
} finally {
try {
if (br != null) {
br.close();
}
} catch (IOException ioe) {
ioe.printStackTrace();
}
}

return result.toString();
}

Writing textual content from a file using UTF-8 encoding:
public static void saveContent(String content, String fileName) {
String directoryName = fileName.substring(0,
fileName.lastIndexOf(File.separatorChar));
if (directoryName.length() > 0) {
File outputFile = new File(directoryName);
outputFile.mkdirs();
}

BufferedWriter writer = null;
try {
writer = new BufferedWriter(new OutputStreamWriter(
new FileOutputStream(fileName), "UTF-8"));
writer.write(content);
} catch (Exception e) {
e.printStackTrace();
} finally {
try {
if (writer != null) {
writer.close();
}
} catch (IOException ioe) {
ioe.printStackTrace();
}
}
}

No comments:

Post a Comment