0

I want to delete whitespace character between /> and <, after read this What is a non-capturing group? What does (?:) do? , I used the pattern like this (?<=\\>)(\\s*\r?\n?)(?=\\<). But I found when I used the pattern like this (?<=\\>)(?:\\s*\r?\n?)(?=\\<), it can also work well. Here is my test case :

<a>asdfdsf</a> \n     <b> hahaha <b/>     <a>asdfdsf</a> \n     <b> hahaha <b/>   <a>asdfdsf</a> \n     <b> hahaha <b/>

s.replaceAll("(?<=\\>)(?:\\s*\r?\n?)(?=\\<)", "")
s1.replaceAll("(?<=\\>)(\\s*\r?\n?)(?=\\<)", "")

and I got the result :

<a>asdfdsf</a><b> hahaha <b/><a>asdfdsf</a><b> hahaha <b/><a>asdfdsf</a><b> hahaha <b/>

after used the 2 patterns above.

How does (?:\s*\r?\n?)(?=\<) work? Why they have the same result?

user3483203
  • 45,503
  • 8
  • 43
  • 75
M Chen
  • 133
  • 1
  • 1
  • 15
  • 1
    In addition, you should probably avoid using regex to parse HTML content. Use an HTML/DOM parser instead. – Tim Biegeleisen Apr 10 '18 at 05:21
  • 2
    Sorry that your question was marked as a duplicate. Basically the answer is that they do the same thing because you're not using capture groups. Capture groups won't affect whether a given segment of text is a match. They are used for querying specific parts of matches. But all you're doing is deleting anything that matches, so they both work. – William Rosenbloom Apr 10 '18 at 05:24
  • @TimBiegeleisen just like i said, after read that answer, I have this pattern, the main point is I thik the first way is wrong, but it still have the same result. – M Chen Apr 10 '18 at 05:25
  • There is _no_ functional difference between those two patterns, because you don't use the capture groups, you just replace with empty string. – Tim Biegeleisen Apr 10 '18 at 05:25
  • 1
    I guess the title should have been "why do replaceAll replace characters, even when they are in a non-capturing group". To actually try to answer the question it will replace everything that matches (and a non-capturing group is also a match) https://docs.oracle.com/javase/7/docs/api/java/lang/String.html#replaceAll(java.lang.String,%20java.lang.String) – Marcus Apr 10 '18 at 05:26
  • @Marcus It looks like only **?:** can be replaced, **?<=** and **?=** will not work. – M Chen Apr 10 '18 at 05:36
  • 1
    @MChen lookarounds (like `?<=` and `?=`) do not consume characters, they just assert if match is possible – Marcus Apr 10 '18 at 05:41

0 Answers0