-2

I want to find and replace the following string:

<tag a=“x” b=“y” c=“z”/>

However it can present in any order, e.g.

<tag c=“z” b=“y” a=“x”/>
<tag b=“y” a=“x” c=“z”/>

What would be the regex term to find all instances of this string?

Wiktor Stribiżew
  • 484,719
  • 26
  • 302
  • 397

2 Answers2

0

I believe the regex query that you want is:

<tag ([a-z]){1}=“([a-z]){1}” ([a-z]){1}=“([a-z]){1}” ([a-z]){1}=“([a-z]){1}”/>

Let me help to explain the elements of this query:

  1. ([a-z]) Will match any string between a to z
  2. Adding {1} will tell the query that you want to match this query just once!
  3. So ([a-z]){1} will match any string between a to z just once.

If we use this element in this example the matched strings will be:

<tag a=“x” b=“y” c=“z”/>

matched strings: t,a,g,a,x,b,y,c in that order.

If you add your string structure to your query:

tag ([a-z]){1}=“([a-z]){1}”

matched strings: tag a=“x“

Hope this helps!

EnriqueBet
  • 1,091
  • 2
  • 10
  • 20
  • Thanks @CarySwoveland, I tested it on an online regex match. But you are right, in python the "/" should be escaped, and according to your comments the solution should be: `` – EnriqueBet Mar 11 '20 at 17:57
  • Thank you for your very instructional answer. However I should have specified in my question that this was only a simplified example. I am looking for a specific xml tag in a DOCX file and my problem is that MS Word randomly reorders the attributes in some instances (I believe when a user with a different system language saves the file). – Invertedchicken Mar 11 '20 at 17:58
  • The online regex engine that I am using is [https://regexr.com/](regexr.com). I understand your point on the string `` but this kind of string would be valid for an xml or html document? Just trying to justify my not so good answer hahahaha :P! – EnriqueBet Mar 12 '20 at 15:28
0

This is one way:

^<tag +([abc])=“([xyz])“ +(?!\1)([abc])=“(?!\2)([xyz])“ +(?!\1|\3)[abc]=“(?!\2|\4)[xyz]“\/>$

Demo

^         # match beginning of line
<tag      # match '<tag'       
 +        # match 1+ spaces
([abc])   # match 'a', 'b' or 'c' in cap group 1
=“        # match '=“'
([xyz])   # match 'x', 'y' or 'z' in cap group 2
“ +       # match '“' followed by 1+ spaces 
(?!\1)    # following cannot match contents of cap group 1
([abc])   # match 'a', 'b' or 'c' in cap group 3
=“        # match '=“'
(?!\2)    # following cannot match contents of cap group 2
([xyz])   # match 'x', 'y' or 'z' in cap group 4
“ +       # match '“' followed by 1+ spaces 
(?!\1|\3) # following cannot match contents of cap group 1 or 3
[abc]=“   # match 'a', 'b' or 'c' followed by '=“'
(?!\2|\4) # do not match contents of cap group 2 or 4
[xyz]“\/> # match 'x', 'y' or 'z' followed by '“/>'
$         # match end of line
Cary Swoveland
  • 94,081
  • 5
  • 54
  • 87