I was working on a HW problem that involves removing all of the html tags "<...>" from the text of an html code and then count all of the tokens in that text.
I wrote a solution that works but it all comes down to a single line of code that I didn't actually write and I'm curious to learn more about how this kind of code works.
public static int tagStrip(Scanner in) {
int count = 0;
while(in.hasNextLine()) {
String line = in.nextLine();
line = line.replaceAll("<[^>\r\n]*>", "");
Scanner scan = new Scanner(line);
while(scan.hasNext()) {
String word = scan.next();
count++;
}
}
return count;
}
Line 7 is the one I'm curious about. I understand how the replaceAll() method works. I'm not sure how that String "<[^>\r\n]*>" works. I read a little bit about patterns and messed around with it a bit.
I replaced it with "<[^>]+>" and it still works exactly the same. So I was hoping somebody could explain how these characters work and what they do especially within the construct of this type of program.