0

I have to read a URLConnection response containing 2MB of pretty printed JSON in java.

2mb is not "small" but by no means large. It contains JSON. However, it is pretty printed JSON with around 60k lines. A

while ((line = bufferedReader.readLine()) != null) {
    lineAllOfIt += line;
}

takes around 10 minutes to read this response. There must be something wrong with my approach, but I cannot picture a better approach.

Nicolas Filotto
  • 39,066
  • 11
  • 82
  • 105
Georg Heiler
  • 13,862
  • 21
  • 115
  • 217
  • I assume you mean MB, otherwise your file would be tiny at 2 milliBit :P – MrKickkiller Jun 01 '16 at 12:04
  • 1
    `lineAllOfIt += line;` is "wrong" since strings are immutable and you create new ones with increasing size over and over again. Use a string builder or do it like http://stackoverflow.com/a/37079572/995891 – zapl Jun 01 '16 at 12:07
  • do you want to write an answer? this is the solution – Georg Heiler Jun 01 '16 at 12:14
  • what do you want to do with your JSON? parse it no? – Nicolas Filotto Jun 01 '16 at 12:15
  • I don't believe that it is a good idea to load a file of 2 Mo into memory anyway even in a StringBuilder unless you only do it only once and this operation cannot be done in parallel otherwise you will fill up your heap – Nicolas Filotto Jun 01 '16 at 12:19
  • So which approach would you recommend instead? – Georg Heiler Jun 01 '16 at 12:20
  • @geoHeil see my question above – Nicolas Filotto Jun 01 '16 at 12:23
  • I see but you would only recommend it because of no problems with the encoding scheme? Or is there an improvement regarding performance? – Georg Heiler Jun 01 '16 at 12:24
  • Depends on what you want to do with the data. Do you need it in memory anyways? Just do it, 2MB isn't that big. Do you need to extract only certain information? Think about using an incremental / streaming approach (related: http://stackoverflow.com/questions/444380/is-there-a-streaming-api-for-json ) where you only keep a small window to the data in memory. And when you can live with a small window and you can't do what you want in 1 go do what Underbalanced suggests because that allows to seek in the data. – zapl Jun 01 '16 at 12:51

1 Answers1

1

For this particular case, I would cache the file locally using java you can have a low memory transfer of the file to your computer, then you can go through it line by line without loading the file into memory as well and pull out the data you need or loading it all at once.

EDIT: Made changes on variable names i pulled this from my code and forgot to neutralize the variables. Also FileChannel transferTo/transferFrom can be much more efficient as there is potentially less copies and depending on operation could go from the SocketBuffer -> Disk. FileChannel API

    String urlString = "http://update.domain.com/file.json" // File URL Path
    Path diskSaveLocation = Paths.get("file.json"); // This will be just help place it in your working directory

    final URL url = new URL(fileUrlString);
    final URLConnection conn = url.openConnection();
    final long fileLength = conn.getContentLength();
    System.out.println(String.format("Downloading file... %s, Size: %d bytes.", fileUrlString, fileLength));
    try(
            FileOutputStream stream = new FileOutputStream(diskSaveLocation.toFile(), false);
            FileChannel fileChannel = stream.getChannel();
            ReadableByteChannel inChannel = Channels.newChannel(conn.getInputStream());
    ) {
        long read = 0;
        long readerPosition = 0;
        while ((read = fileChannel.transferFrom(inChannel, readerPosition, fileLength)) >= 0 && readerPosition < fileLength) {
            readerPosition += read;
        }
        if (fileLength != Files.size(diskSaveLocation)) {
            Files.delete(diskSaveLocation);
            System.out.println(String.format("File... %s did not download correctly, deleting file artifact!", fileUrlString));
        }
    }
    System.out.println(String.format("File Download... %s completed!", fileUrlString));
    ((HttpURLConnection) conn).disconnect();

You can now read this same file using a NIO2 method that allows you to read line by line without loading into memory. Using Scanner or RandomAccessFile methods you can prevent reading lines into the heap. If you want to read the whole file in you can also do so locally from the cached file using many of the methods from Javas Files utility methods.

Java Read Large Text File With 70million line of text

Community
  • 1
  • 1
Mr00Anderson
  • 823
  • 8
  • 16