10

I have a text file (XML created with XStream) which is 63000 lines (3.5 MB) long. I'm trying to read it using Buffered reader:

                BufferedReader br = new BufferedReader(new FileReader(file));
                try {
                    String s = "";
                    String tempString;
                    int i = 0;
                    while ((tempString = br.readLine()) != null) {
                        s = s.concat(tempString);
//                        s=s+tempString;
                        i = i + 1;
                        if (i % 1000 == 0) {
                            System.out.println(Integer.toString(i));
                        }
                    }
                    br.close();

Here you can see my attempts to measure reading speed. And it's very low. It takes seconds to read 1000 lines after 10000 line. I'm clearly doing something wrong, but can't understand what. Thanks in advance for your help.

lozga
  • 117
  • 1
  • 8
  • Is your intent to parse this file? Why not just load it with Xerces/SAX/ other parsing tool? – Visionary Software Solutions Apr 06 '13 at 10:26
  • 10
    String `+` and `concat` is very inefficient if the Strings are large. Use `StringBuilder` or pass the `InputStream`/`Reader` straight to the xml parser. – Paul Grime Apr 06 '13 at 10:27
  • Or if you truly need lines, use something like this - http://commons.apache.org/proper/commons-io/javadocs/api-2.4/org/apache/commons/io/IOUtils.html#readLines%28java.io.Reader%29. – Paul Grime Apr 06 '13 at 10:29
  • Yes, I'm trying to parse this file and input it into Xstream again to read saved class. Lines are not critical. – lozga Apr 06 '13 at 10:31
  • 1
    If you need it in XStream, why don't you just pass the reader directly to XStream instead of reading it yourself and then passing the string. – Mark Rotteveel Apr 06 '13 at 10:36
  • eg, parsedObject = new XStream().fromXml(br); http://xstream.codehaus.org/javadoc/com/thoughtworks/xstream/XStream.html#fromXML(java.io.Reader) – Tom Carchrae Apr 06 '13 at 10:38
  • Oh! I've missed that it's possible to input file to XStream .fromXML mehod. Thanks a lot. – lozga Apr 06 '13 at 10:41

4 Answers4

4

@PaulGrime is right. You are copying the string each time the loop reads a line. Once the string gets big (say 10,000 lines big), it is doing a lot of work to do that copying.

Try this:

StringBuilder sb = new StringBuilder();
while (...reading lines..){ 
   ....
   sb.append(tempString);  //should add newline
   ...
}

s = sb.toString();

Note: read Paul's answer below on why stripping newlines makes this a bad way to read in a file. Also, as mentioned in the question comments, XStream provides a way to read the file and even if it had not, IOUtils.toString(reader) would be a safer way to read a file.

Tom Carchrae
  • 5,916
  • 2
  • 32
  • 33
  • Thanks! Really speeded loading up. – lozga Apr 06 '13 at 10:42
  • 1
    -1 the performance penalty is just not copying, Stringbuilder is the one advised in the documentation, `PaulGrime is right` is not really an answer which deserve to be accepted... and 10000? why? – UmNyobe Apr 06 '13 at 10:51
  • I said, "say 10,000" which is to mean, "for example, when 10,000 lines big". I also explained why Paul was right and gave a code example. Also, please clarify what you mean by "not just copying". – Tom Carchrae Apr 06 '13 at 11:26
  • I also changed to StringBuilder from StringBuffer - they are equivalent classes, but StringBuilder is not threadsafe (which is fine in this case). – Tom Carchrae Apr 06 '13 at 11:29
4

Some immediate improvements you can do:

  • Use a StringBuilder instead of concat and +. Using + and concat can really affect the performance, specially when used in loops.
  • Reduce the access to the disk. You can do it by using a large buffer:

    BufferedReader br = new BufferedReader(new FileReader("someFile.txt"), SIZE);

Maroun
  • 87,488
  • 26
  • 172
  • 226
3

You should use a StringBuilder as String concatenation is extremely slow for even small strings.

Further, try using NIO rather than a BufferedReader.

public static void main(String[] args) throws IOException {
    final File file = //some file
    try (final FileChannel fileChannel = new RandomAccessFile(file, "r").getChannel()) {
        final StringBuilder stringBuilder = new StringBuilder();
        final ByteBuffer byteBuffer = ByteBuffer.allocate(1024);
        final CharsetDecoder charsetDecoder = Charset.forName("UTF-8").newDecoder();
        while (fileChannel.read(byteBuffer) > 0) {
            byteBuffer.flip();
            stringBuilder.append(charsetDecoder.decode(byteBuffer));
            byteBuffer.clear();
        }
    }
}

You can tune the buffer size if it's still too slow - it's heavily system dependent what buffer size works better. For me it makes very little difference if the buffer is 1K or 4K but on other systems I have know that change to increase speed by an order of magnitude.

Boris the Spider
  • 54,398
  • 6
  • 98
  • 152
1

In addition to what has already been said, depending on your use of the XML, your code is potentially incorrect as it discards line endings. For example, this code:

package temp.stackoverflow.q15849706;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.URL;

import com.thoughtworks.xstream.XStream;

public class ReadXmlLines {
    public String read1(BufferedReader br) throws IOException {
        try {
            String s = "";
            String tempString;
            int i = 0;
            while ((tempString = br.readLine()) != null) {
                s = s.concat(tempString);
                // s=s+tempString;
                i = i + 1;
                if (i % 1000 == 0) {
                    System.out.println(Integer.toString(i));
                }
            }
            return s;
        } finally {
            br.close();
        }
    }

    public static void main(String[] args) throws IOException {
        ReadXmlLines r = new ReadXmlLines();

        URL url = ReadXmlLines.class.getResource("xml.xml");
        String xmlStr = r.read1(new BufferedReader(new InputStreamReader(url
                .openStream())));

        Object ob = null;

        XStream xs = new XStream();
        xs.alias("root", Root.class);

        // This is incorrectly read/parsed, as the line endings are not
        // preserved.
        System.out.println("----------1");
        System.out.println(xmlStr);
        ob = xs.fromXML(xmlStr);
        System.out.println(ob);

        // This is correctly read/parsed, when passing in the URL directly
        ob = xs.fromXML(url);
        System.out.println("----------2");
        System.out.println(ob);

        // This is correctly read/parsed, when passing in the InputStream
        // directly
        ob = xs.fromXML(url.openStream());
        System.out.println("----------3");
        System.out.println(ob);
    }

    public static class Root {
        public String script;

        public String toString() {
            return script;
        }
    }
}

and this xml.xml file on the classpath (in the same package as the class):

<root>
    <script>
<![CDATA[
// taken from http://www.w3schools.com/xml/xml_cdata.asp
function matchwo(a,b)
{
if (a < b && a < 0) then
  {
  return 1;
  }
else
  {
  return 0;
  }
}
]]>
    </script>
</root>

produces the following output. The first two lines shows the line endings have been removed, and thus made the Javascript in the CDATA section invalid (as the first JS comment now comments out the whole JS, because the JS lines have been merged).

----------1
<root>    <script><![CDATA[// taken from http://www.w3schools.com/xml/xml_cdata.aspfunction matchwo(a,b){if (a < b && a < 0) then  {  return 1;  }else  {  return 0;  }}]]>    </script></root>
// taken from http://www.w3schools.com/xml/xml_cdata.aspfunction matchwo(a,b){if (a < b && a < 0) then  {  return 1;  }else  {  return 0;  }}    
----------2


// taken from http://www.w3schools.com/xml/xml_cdata.asp
function matchwo(a,b)
{
if (a < b && a < 0) then
  {
  return 1;
  }
else
  {
  return 0;
  }
}
...
Paul Grime
  • 14,517
  • 4
  • 31
  • 55