5

I did the download a web page with the HttpURLConnection.getInputStream() and to get the content to a String, i do the following method:

String content="";
isr = new InputStreamReader(pageContent);
br = new BufferedReader(isr);
try {
    do {
            line = br.readLine();
            content += line;
        } while (line != null);
        return content;
    } catch (Exception e) {
        System.out.println("Error: " + e);
        return null;
    }

The download of the page is fast, but the processing to get the content to String is very slow. There is another way faster to get the content to a String?

I transform it to String to insert in the database.

Renato Dinhani
  • 30,005
  • 49
  • 125
  • 194
  • 1
    how can the download be fast? you're downloading partially and appending to the string simultaneously. – asgs May 06 '11 at 17:59
  • pageContent contains the downloaded content like a InputStream. What I done in this code is transform InputStrem in a String. – Renato Dinhani May 06 '11 at 18:02
  • getting the content from the `InputStream` is what is called as as downloading. – asgs May 06 '11 at 18:03
  • Oh sorry. Yes, it's this that makes slow, but...I dont know another way. – Renato Dinhani May 06 '11 at 18:08
  • 1
    See [read-text-from-inputstream](http://stackoverflow.com/questions/1891606/read-text-from-inputstream#answer-1894244). – asgs May 06 '11 at 18:44

4 Answers4

2

Read into buffer by number of bytes, not something arbitrary like lines. That alone should be a good start to speeding this up, as the reader will not have to find the line end.

Gregory A Beamer
  • 16,342
  • 3
  • 23
  • 29
1

Use a StringBuffer instead.

Edit for an example:

StringBuffer buffer=new StringBuffer();

for(int i=0;i<20;++i)
  buffer.append(i.toString());

String result=buffer.toString();
Blindy
  • 55,135
  • 9
  • 81
  • 120
0

I'm using jsoup to get specified content of a page and here is a web demo based on jquery and jsoup to catch any content of a web page, you should specify the ID or Class for the page content you need to catch: http://www.gbin1.com/technology/democenter/20120720jsoupjquerysnatchpage/index.html

terry
  • 291
  • 1
  • 17
0

use the blob/clob to put the content directly into database. any specific reason for buliding string line by line and put it in the database??

sudmong
  • 2,034
  • 13
  • 12