5

I want to achieve a way to get all the content between one open span tag and it's close tag. The problem is that sometime I can have nested span and I want to be sure that my regex don't stop a the first ending span it see.

To see my problem look at this : Regex101 : nested span

I want to be sure that I get everything between the open and the close tag. no matter how much </span> I can find inside.

I have found a library made by Steven Levithan which could achieve my wants. The problem I have is that the example are basic and I am not sure I can achieve what I want.

I'm using the XregExp.matchRecursive method. In the example they give a start tag and a end tag. My start tag is a bit complicated, it look like that : <span style=\\?"color:([a-zA-Z\s]*?)\\?">. The problem is when I execute this method with this delimiter, I get this error : string contains unbalanced delimiters. The tested string is :

<p style=\"text-align:justify\">
    <span style=\"font-size:12pt\">
        <span style=\"color:Green\">
            <span style=\"font-family:Verdana\">There is some content for a mm advertisment.There is some co</span>
            <span style=\"font-family:Times New Roman\">ntent for a mm advertisment.</span>
        </span>
    </span>
</p>

I think my problem is because of the regex I use as a start delimiter. As explain in the doc we should add a level of escaping backslash in the regex. That's why I try this regex as start delimiter : <span style=\\\\?"color:([a-zA-Z\\s]*?)\\\\?">. Still not working. I don't see how I can do to tell this method to find everything between the span who have the color style attribute and his close tag.

Maybe somebody have a solution?

Abdulla Nilam
  • 29,776
  • 14
  • 53
  • 77
Ganbin
  • 1,523
  • 1
  • 10
  • 19
  • Why oh why are you using regular expressions for this? If it's valid HTML, please use the DOM functions. – Ja͢ck Jul 07 '15 at 11:46
  • Of course with jquery I can do that in one line of code with the html() method. But I need to do this server-side in a Wakanda environment. – Ganbin Jul 07 '15 at 12:50
  • I'm not talking about jquery, pure JavaScript can do this as well; surely that's also available in a server environment. – Ja͢ck Jul 07 '15 at 12:52
  • Yes for sure simple javascript in the browser it is easy to do. I could not ask the question if it was client-side. Now on server-side we don't have DOM parser that work like in client side so we have to parse ourself the string. – Ganbin Jul 07 '15 at 13:17
  • Then I've lost all hope in server-side JavaScript =( – Ja͢ck Jul 07 '15 at 13:18
  • I see there is a way to use jQuery on nodejs. Later when I will have time I will try to use jQuery on the Wakanda environment. – Ganbin Jul 09 '15 at 11:04

2 Answers2

1

Is there perhaps an option to use some kind of a parser that is more powerful than regular expressions? The latter are, generally speaking, not really suitable for parsing non-regular languages, even though they might provide certain extensions compared to "pure" regular expressions in theoretical sense.

plamut
  • 2,702
  • 9
  • 27
  • 35
  • The OP is using XRegExp, which (as I understand it) _**is**_ more powerful than using just regular expressions. That aside, further comments from the OP have ruled out using an even more appropriate tool for his/her use-case. (Alas.) – randomsimon Dec 10 '15 at 16:11
1

So the block you're hitting is the error "string contains unbalanced delimiters".

That would be because your start delimiter only matches one of the start span tags in your test input (the one that specifies the colour) but your end delimiter matches all four of the end span tags.

I think you'll have to approach this by firstly matching all the span tags (with the library you've found) and then re-process to find the ones you care about.

randomsimon
  • 394
  • 3
  • 11