Reading a whole file to a String in Java is tricky, one has to pay attention to many aspects:
- Read with the proper character set (encoding).
- Don't ignore the newline at the end of the file.
- Don't waste CPU and memory by adding String objects in a loop (use a StringBuffer or an ArrayList<String> instead).
- Don't waste memory (by line-buffering or double-buffering).
See my solution at http://stackoverflow.com/questions/1656797/how-to-read-a-file-into-string-in-java/1708115#1708115
For your convenience, here it is my code:
// charsetName can be null to use the default charset. public static String readFileAsString(String fileName, String charsetName) throws java.io.IOException { java.io.InputStream is = new java.io.FileInputStream(fileName); try { final int bufsize = 4096; int available = is.available(); byte data[] = new byte[available < bufsize ? bufsize : available]; int used = 0; while (true) { if (data.length - used < bufsize) { byte newData[] = new byte[data.length << 1]; System.arraycopy(data, 0, newData, 0, used); data = newData; } int got = is.read(data, used, data.length - used); if (got <= 0) break; used += got; } return charsetName != null ? new String(data, 0, used, charsetName) : new String(data, 0, used); } finally { is.close(); } }
2 comments:
You don't really trust JVM implementations too much. :-) Otherwise you'd have used an InputStreamReader + a StringBuilder and appended each chunk of data read in to the end of the temporary buffer. Instead you decided to use a custom managed buffer (data[]) and handle the buffer allocation yourself.
It'd be interesting to see which performs better.
Btw. I'd choose a buffersize larger than 4K. If your filesystem's block (or sector ... whatever you call it) size is larger than 4K, then a larger buffersize will perform better. If it's 4K or less, than having a buffer size that is a multiple of the block size performs the same as if you chose the blocksize for the buffer size. At least in theory. :-)
@müzso: Thanks for your comments. I agree that a larger block size might be faster.
Feel free to write your implementation, and to measure the speed difference.
I guess you mean FileReader instead of InputStreamReader -- never mind, it doesn't make much difference.
I'd never use a *Reader for reading the entire file, because there might be an UTF-8 multibyte sequence at the buffer boundary. Detecting and cutting that would make buffering inefficient.
I'd never use a StringBuilder for reading the entire file, because it involves copying the data unnecessarily.
I doubt that there is a code faster than in my blog post to read an entire file. If you can contradict that, please give an example.
Post a Comment