0

I need to get the link out of a bunch of HTML and im using patterns for that. The problem is that the pattern includes the text before and after (.*?). Should it do that? I thought it only includes the text between boundaries.

Ive modified the code a little bit and now it only includes the quote.

Pattern p = Pattern.compile("http://cdn.posh24.se/images/:profile(.*?)");
Matcher m = p.matcher(splitStrings[0]);;

[http://cdn.posh24.se/images/:profile/088484075fb5b4418f5cb8814728decab",... that is the output, this is the expected: [http://cdn.posh24.se/images/:profile/088484075fb5b4418f5cb8814728decab

2 Answers2

2

You can do something like this:

Pattern p = Pattern.compile("http://cdn.posh24.se/images/:profile(.*?)(?=\")");

This sequence is called Positive Look Ahead. You can find a good explanation here.

Saeed Entezari
  • 3,303
  • 1
  • 14
  • 35
0
Pattern p =  Pattern.compile("http://cdn.posh24.se/images/:profile([^\"]*)");
Matcher m = p.matcher(splitStrings[0]);

while (m.find()) {
    System.out.println(m.group(0));
}
Jason Aller
  • 3,391
  • 28
  • 37
  • 36
Marc G. Smith
  • 836
  • 5
  • 8