I am trying to use python regex to match all the character between '['
and '] '
(note the extra space). Initially I had naively matched ([^])
and then consumed the space, but in one of my test cases there are ]
characters. My example string is
127.0.0.1 - dgm [Thu Oct 2 09:55:11 2014] [9 <?xml version="1.0"?><action><signal><composite id="1" data="atm_te" format="generic" exp_number_start="12000" exp_number_end="0" pass_start="-1" pass_end="-1"><composite_dim dimid="0" dim="atm_r" to_dim="0" format="generic" /></composite><documentation id="1" exp_number_start="12000" exp_number_end="0" pass_start="-1" pass_end="-1"><description>YAG TS Te Profile</description><label>Te</label><units>[eV]</units><dimension dimid="0"><label>Major Radius</label><units>[m]</units></dimension><dimension dimid="1"><label>Time</label><units>[s]</units></dimension></documentation></signal></action> 13500 0 0 XML MAST MAST ] 0 20872 [] 44.909000 6 6 [25837 7417]
So I have tried using lookahead using the regex
(?P<host>\s|\S+)\s?- (?P<username>\s|\S+)\s?\[(?P<datetime>[^]]*)\]\s\[(?P<query>.*(?=\]\s))\]\s
but this seem to overmatch both in my code and in pythex web page.
The result for the query group that I am looking for is
"9 <?xml version="1.0"?><action><signal><composite id="1" data="atm_te" format="generic" exp_number_start="12000" exp_number_end="0" pass_start="-1" pass_end="-1"><composite_dim dimid="0" dim="atm_r" to_dim="0" format="generic" /></composite><documentation id="1" exp_number_start="12000" exp_number_end="0" pass_start="-1" pass_end="-1"><description>YAG TS Te Profile</description><label>Te</label><units>[eV]</units><dimension dimid="0"><label>Major Radius</label><units>[m]</units></dimension><dimension dimid="1"><label>Time</label><units>[s]</units></dimension></documentation></signal></action> 13500 0 0 XML MAST MAST "
(and yes the trailing white space is important)