0

I am trying to use python regex to match all the character between '[' and '] ' (note the extra space). Initially I had naively matched ([^]) and then consumed the space, but in one of my test cases there are ] characters. My example string is

127.0.0.1 - dgm [Thu Oct  2 09:55:11 2014] [9 <?xml version="1.0"?><action><signal><composite id="1" data="atm_te" format="generic" exp_number_start="12000" exp_number_end="0" pass_start="-1" pass_end="-1"><composite_dim dimid="0" dim="atm_r" to_dim="0" format="generic" /></composite><documentation id="1" exp_number_start="12000" exp_number_end="0" pass_start="-1" pass_end="-1"><description>YAG TS Te Profile</description><label>Te</label><units>[eV]</units><dimension dimid="0"><label>Major Radius</label><units>[m]</units></dimension><dimension dimid="1"><label>Time</label><units>[s]</units></dimension></documentation></signal></action> 13500 0   0 XML MAST MAST ] 0 20872 [] 44.909000 6 6 [25837 7417]

So I have tried using lookahead using the regex

(?P<host>\s|\S+)\s?- (?P<username>\s|\S+)\s?\[(?P<datetime>[^]]*)\]\s\[(?P<query>.*(?=\]\s))\]\s

but this seem to overmatch both in my code and in pythex web page.

The result for the query group that I am looking for is

"9 <?xml version="1.0"?><action><signal><composite id="1" data="atm_te" format="generic" exp_number_start="12000" exp_number_end="0" pass_start="-1" pass_end="-1"><composite_dim dimid="0" dim="atm_r" to_dim="0" format="generic" /></composite><documentation id="1" exp_number_start="12000" exp_number_end="0" pass_start="-1" pass_end="-1"><description>YAG TS Te Profile</description><label>Te</label><units>[eV]</units><dimension dimid="0"><label>Major Radius</label><units>[m]</units></dimension><dimension dimid="1"><label>Time</label><units>[s]</units></dimension></documentation></signal></action> 13500 0   0 XML MAST MAST "

(and yes the trailing white space is important)

HungMung
  • 1
  • 1
  • *"overmatch"*? I don't see *any* match: https://regex101.com/r/wM1fO0/1. Why is the trailing space important? Would https://regex101.com/r/wM1fO0/2 do what you need? – jonrsharpe May 25 '16 at 10:11
  • I am using the trailing space to distinguish between the end of the _query_ group and a _]_ within the signal group. I will try the website you suggest, but it works on [pythex](http://pythex.org/) – HungMung May 25 '16 at 10:14
  • Please [edit] the question to explain what you're looking for (and why, so we can rule out [XY problems](http://meta.stackexchange.com/q/66377/248731)). – jonrsharpe May 25 '16 at 10:17
  • *Why* is the trailing white space important? Also the second link in my first comment gets that match. – jonrsharpe May 25 '16 at 10:22
  • Possible duplicate of [Regular Expression to match outer brackets](http://stackoverflow.com/questions/546433/regular-expression-to-match-outer-brackets) – wagnerpeer May 25 '16 at 10:30
  • The last trailing white space is important as it indicates a missing field. However, your solution worked fine. What I don't understand is why using lookahead matched the 'next' pattern (i.e. the one after the one I was hoping to find). Thanks for the help anyway – HungMung May 25 '16 at 10:35

0 Answers0