0

in my C# Application im trying to delete some of my XML Elements by filtering them out with a regular expression.

My Input is for example:

<myXMLTag id="Text1.Text2.Text3">
   <Anything/>
</myXMLTag>
<myXMLTag  id="Text1.ISHOULDNOTBEHERE.Text3">
   <Anything/>
</myXMLTag>
<myXMLTag  id="Text1.Text2.Text3">
    <Anything/>
</myXMLTag>

I tried some regular Expressions on http://regexstorm.net/tester but it somehow always marks the first two <myXMLTag> and not just the middle one.

Pattern:

<myXMLTag.*Text1.+(ISHOULDNOTBEHERE)+.*?</(myXMLTag)>

I need a pattern, that only finds XML Elements in a XML string, which look like the middle one.

Essigwurst
  • 455
  • 9
  • 19
Febertson
  • 259
  • 1
  • 13
  • So do you want to match them all or just the middle one – TheGeneral Jun 15 '18 at 05:21
  • I just want the regex to match the middle one. – Febertson Jun 15 '18 at 05:23
  • 1
    [XY problem](http://xyproblem.info/). Never ever use Regex for XML parsing/manipulating. Use XML functions from an XML library of your choise. – Uwe Keim Jun 15 '18 at 05:35
  • Do you really need a + quantifier for the search keyword in question? – wp78de Jun 15 '18 at 05:41
  • @UweKeim Thats not the question. Thanks for repeating what i stated in my question, but the comment does not help a single bit. – Febertson Jun 15 '18 at 05:57
  • https://stackoverflow.com/a/1732454/62576 This is about the ten thousandth question related to parsing X/HTML with regular expressions, and about the 10,000th time we've had to write *Stop wasting your time trying to parse XML with a regex and use a DOM parser instead.* – Ken White Jun 15 '18 at 12:27
  • Why the rage? Every Software Developer knows that XML Parsing with Regex is shit. But sometimes you gotta go such ways, even if you know its wrong. If you didnt have to do this step yet, im happy for you. And i do not wish that you gotta do it once. – Febertson Jun 18 '18 at 09:34

3 Answers3

1

Parsing XML using regex is definitively not a good idea. The is only little room for cuttings like this.

That said, try it like this:

<(myXMLTag)\s+id="[^"]+(ISHOULDNOTBEHERE)(?:(?!</\1>).)+</\1>

Demo

Explanation

  • <(myXMLTag)\s+id=" serves as start anchor
  • [^"]+ negated range that matches everything but "
  • ISHOULDNOTBEHERE obviously your keyword
  • (?!</\1>).)+ tempered greedy token that matches everything but the end tag using a back reference
  • </\1> the end tag, again using a back reference
wp78de
  • 16,078
  • 6
  • 34
  • 56
  • Can you provide an Regex for the other way round? Where i only find the XML Elements that do NOT contain the "ISHOULDNOTBEHERE" Key Word? :D – Febertson Jun 18 '18 at 08:28
1

The standard response to questions about parsing XML using regular expressions is

RegEx match open tags except XHTML self-contained tags

That answer might seem over-the-top, but it's justified: most of us have seen the disastrous results that can arise if you attempt this. Basically, any program that tries to process XML using regexes will be slow and buggy. If you want to get results quickly and don't mind the bugs, then go ahead - and make sure you don't stay around with the project long enough to take the consequences.

Use an XML parser, it's the right tool for the job.

Michael Kay
  • 138,236
  • 10
  • 76
  • 143
  • That's exactly the point. The Software im implementing this in, will be deleted in the next few months and its build up on working with regular expressions and XML. We all know this is bad, but sometimes you have to do stuff that you know is wrong. But the function behind it is still needed as a quick and dirty way. Im not too happy about it too :D – Febertson Jun 18 '18 at 06:36
0

This is a bit ugly, but as long as you respect the pattern in your example it should work:

.+ISHOULDNOTBEHERE.+\n.+\n<\/myXMLTag>

Test it here regex101

  • Starting a line, match 1 or more any characters (.+)
  • Recognise the literal ISHOULDNOTBEHERE
  • Consume any characters until \n (.+\n)
  • Consume 1 or more any characters in the next line and the line jump to the next (.+\n)
  • Recognise the literal </myXMLTag>
Jorge.V
  • 1,217
  • 1
  • 8
  • 15