Regex for specific XML Elements

Question

in my C# Application im trying to delete some of my XML Elements by filtering them out with a regular expression.

My Input is for example:

<myXMLTag id="Text1.Text2.Text3">
   <Anything/>
</myXMLTag>
<myXMLTag  id="Text1.ISHOULDNOTBEHERE.Text3">
   <Anything/>
</myXMLTag>
<myXMLTag  id="Text1.Text2.Text3">
    <Anything/>
</myXMLTag>

I tried some regular Expressions on http://regexstorm.net/tester but it somehow always marks the first two <myXMLTag> and not just the middle one.

Pattern:

<myXMLTag.*Text1.+(ISHOULDNOTBEHERE)+.*?</(myXMLTag)>

I need a pattern, that only finds XML Elements in a XML string, which look like the middle one.

[XY problem](http://xyproblem.info/). Never ever use Regex for XML parsing/manipulating. Use XML functions from an XML library of your choise. — Uwe Keim, Jun 15 '18 at 05:35
Do you really need a + quantifier for the search keyword in question? — wp78de, Jun 15 '18 at 05:41
@UweKeim Thats not the question. Thanks for repeating what i stated in my question, but the comment does not help a single bit. — Febertson, Jun 15 '18 at 05:57
https://stackoverflow.com/a/1732454/62576 This is about the ten thousandth question related to parsing X/HTML with regular expressions, and about the 10,000th time we've had to write *Stop wasting your time trying to parse XML with a regex and use a DOM parser instead.* — Ken White, Jun 15 '18 at 12:27
Why the rage? Every Software Developer knows that XML Parsing with Regex is shit. But sometimes you gotta go such ways, even if you know its wrong. If you didnt have to do this step yet, im happy for you. And i do not wish that you gotta do it once. — Febertson, Jun 18 '18 at 09:34

wp78de · Accepted Answer · 2018-06-15T05:43:38.503

1

Parsing XML using regex is definitively not a good idea. The is only little room for cuttings like this.

That said, try it like this:

<(myXMLTag)\s+id="[^"]+(ISHOULDNOTBEHERE)(?:(?!</\1>).)+</\1>

Demo

Explanation

<(myXMLTag)\s+id=" serves as start anchor
[^"]+ negated range that matches everything but "
ISHOULDNOTBEHERE obviously your keyword
(?!</\1>).)+ tempered greedy token that matches everything but the end tag using a back reference
</\1> the end tag, again using a back reference

edited Jun 15 '18 at 05:43

answered Jun 15 '18 at 05:32

wp78de

16,078
6
34
56

Can you provide an Regex for the other way round? Where i only find the XML Elements that do NOT contain the "ISHOULDNOTBEHERE" Key Word? :D – Febertson Jun 18 '18 at 08:28

score 1 · Answer 2 · answered Jun 15 '18 at 11:20

The standard response to questions about parsing XML using regular expressions is

RegEx match open tags except XHTML self-contained tags

That answer might seem over-the-top, but it's justified: most of us have seen the disastrous results that can arise if you attempt this. Basically, any program that tries to process XML using regexes will be slow and buggy. If you want to get results quickly and don't mind the bugs, then go ahead - and make sure you don't stay around with the project long enough to take the consequences.

Use an XML parser, it's the right tool for the job.

That's exactly the point. The Software im implementing this in, will be deleted in the next few months and its build up on working with regular expressions and XML. We all know this is bad, but sometimes you have to do stuff that you know is wrong. But the function behind it is still needed as a quick and dirty way. Im not too happy about it too :D — Febertson, Jun 18 '18 at 06:36

Jorge.V · Answer 3 · 2018-06-15T05:31:41.207

0

This is a bit ugly, but as long as you respect the pattern in your example it should work:

.+ISHOULDNOTBEHERE.+\n.+\n<\/myXMLTag>

Test it here regex101

Starting a line, match 1 or more any characters (.+)
Recognise the literal ISHOULDNOTBEHERE
Consume any characters until \n (.+\n)
Consume 1 or more any characters in the next line and the line jump to the next (.+\n)
Recognise the literal </myXMLTag>

edited Jun 15 '18 at 05:31

answered Jun 15 '18 at 05:23

Jorge.V

1,217
1
8
15

By the way OP tell me if you'd like an explanation or it is self explainatory. – Jorge.V Jun 15 '18 at 05:24
It's okay, I can read regular expressions. I'll try this one out. – Febertson Jun 15 '18 at 05:26
1

@Jorge.V - I would like an explanation please. – Enigmativity Jun 15 '18 at 05:29
I couldnt get this to work correctly :( I cant even tell you why. The other answer worked out of the box. But thanks for your effort. – Febertson Jun 15 '18 at 06:29

Regex for specific XML Elements

3 Answers3