0

I'm putting together a small script to populate some data based on a web page I saved locally (http://payday.wikia.com/wiki/Achievements_(Payday_2)).

The script:

public static void main(String [] args) throws FileNotFoundException{

    File file = new File("C:\\Users\\Jester\\Desktop\\data scrap payday\\Achievements_(Payday_2).htm");
    int count = 0;
    int words = 0;
    Scanner scanner = new Scanner(file);
    while (scanner.hasNext()) {   
        String nextToken = scanner.next();
        if (nextToken.contains("unlock")||nextToken.contains("Unlock")){
            count++;
        }
        words++;
           System.out.println(nextToken);
    }
    scanner.close();
    System.out.println(count);
    System.out.println(words);
}

However, the while loop is ending on the line

<td style="vertical-align: top; width: 64px"> <a href="http://vignette3.wikia.nocookie.net/payday/images/d/db/From_Russia_With_Love.jpg/revision/latest?cb=20131103145029"  class="image image-thumbnail"           ><img src="data:image/gif;base64,R0lGODlhAQABAIABAAAAAP///yH5BAEAAAEALAAAAAABAAEAQAICTAEAOw%3D%3D"   alt="From Russia With Love"    class="lzy lzyPlcHld "      data-image-key="From_Russia_With_Love.jpg"  data-image-name="From Russia With Love.jpg"      data-src="http://vignette3.wikia.nocookie.net/payday/images/d/db/From_Russia_With_Love.jpg/revision/latest?cb=20131103145029"       width="64"      height="64"             onload="if(typeof ImgLzy===&#39;object&#39;){ImgLzy.load(this)}"   ><noscript><img src="http://vignette3.wikia.nocookie.net/payday/images/d/db/From_Russia_With_Love.jpg/revision/latest?cb=20131103145029"     alt="From Russia With Love"    class=""        data-image-key="From_Russia_With_Love.jpg"  data-image-name="From Russia With Love.jpg"          width="64"      height="64"                ></noscript></a>

with the last word being:

href="http://vignette3.wikia.nocookie.net/p

(not sure why it's cutting off half the word either, there's no space in it).

There seem to be various lines throughout the html that fit the end condition for the while loop if I remove that line, but I can't seem to figure out what the pattern might be.

Any ideas as to why scanner.hasNext() is returning false on these?

JesterXIII
  • 139
  • 8

0 Answers0