1

I am trying to write a regex for matching a text file that has multiple lines such as :

* 964      0050.56aa.3480    dynamic   200        F    F  Veth1379
* 930      0025.b52a.dd7e    static    0          F    F  Veth1469

My intention is to match the "0050.56aa.3480 " and "Veth1379" and put them in group(1) & group(2) for using later on.

The regex I wrote is :

\*\s*\d{1,}\s*(\d{1,}\.(?:[a-z][a-z]*[0-9]+[a-z0-9]*)\.\d{1,})\s*(?:[a-z][a-z]+)\s*\d{1,}\s*.\s*.\s*((?:[a-z][a-z]*[0-9]+[a-z0-9]*))

But it does not seem to be working when I test at: http://www.pythonregex.com/

Could someone point to any obvious error I am doing here.

Thanks, ~Newbie

  • pythex.org is more fun, it checks your pattern in real time. – Casimir et Hippolyte Apr 15 '14 at 19:01
  • `(\d{1,}.(?:[a-z][a-z][0-9]+[a-z0-9]).\d{1,})` matches 1+ digits, any character, 2 letters, 1+ digits, 1 alphanumeric, any character, and 1+ digits. This doesn't match `0050.56aa.3480` or `0025.b52a.dd7e`. Can you define how we match this string? – Sam Apr 15 '14 at 19:04
  • A lot of people have issues when they come here asking for regex with only an example. You need to describe the pattern to us in english. If you give us one example and say you need a regex to match it, it can be done with a very simple regex, but probably won't match your other cases, or match too many. – Cruncher Apr 15 '14 at 19:05
  • That is a complicated regex you've got there. – aliteralmind Apr 15 '14 at 19:13
  • @aliteralmind: it is because he doesn't use shortcuts `\d{1,}` => `\d+` and `[a-z][a-z]*` => `[a-z]+`, `(?:[a-z][a-z]+)` => `[a-z]{2,}` – Casimir et Hippolyte Apr 15 '14 at 19:15
  • @CasimiretHippolyte: I should say "verbose". – aliteralmind Apr 15 '14 at 19:18

4 Answers4

2

Try this:

^\* [0-9]{3} +([0-9]{4}.[0-9a-z]{4}.[0-9a-z]{4}).*(Veth[0-9]{4})$

Regular expression visualization

Debuggex Demo

The first part is in capture group one, the "Veth" code in capture group two.


Please consider bookmarking the Stack Overflow Regular Expressions FAQ for future reference. There's a list of online testers in the bottom section.

Community
  • 1
  • 1
aliteralmind
  • 18,274
  • 16
  • 66
  • 102
  • Thanks. I actually had to editwhat you gave me above but for most part you were correct. This is what I tested thus far and it works. "\*\s*([0-9]{3}\s*[0-9]{4}.[0-9a-z]{4}.[0-9a-z]{4}).*(Veth[0-9]{4})$" – user3325980 Apr 15 '14 at 19:17
  • I think you mean `^\*\s*([0-9]{3}\s*[0-9]{4}.[0-9a-z]{4}.[0-9a-z]{4}).*(Veth[0-9]{4})$`, but yes, that looks good. – aliteralmind Apr 15 '14 at 19:20
  • Hi, I also needed to match lines such as "* 4043 4c00.82f3.95e6 static 0 F F Po1309" & "* 4044 a493.4cc7.d559 dynamic 0 F F Eth1/1/17" my new regex is "\*\s*[0-9]{1,}\s*([0-9a-z]{4}.[0-9a-z]{4}.[0-9a-z]{4}).*([a-z].*)$" but I am having issues in group(2) matching when I test at http://www.pythonregex.com/ could you help me in checking what I am doing wrong. Thanks, Newbie – user3325980 Apr 15 '14 at 21:15
  • Consider asking a new question so you can get fresh help. I can't take a look now as I'm leaving, but I'll check in when I get back. Good luck. – aliteralmind Apr 15 '14 at 21:28
  • 1
    I finally got it working. As I am newbie here the forum is not allowing me to answer. but here is what I used "\*\s*[0-9]{1,}\s*([0-9a-z]{4}.[0-9a-z]{4}.[0-9a-z]{4}).*(Veth\d+|Eth.*|Po\d+)$" Thanks for all the great help – user3325980 Apr 15 '14 at 21:42
2

I don't think you need a regex for this:

for line in open('myfile','r').readlines():
    fields = line.split( )
    print "\n" + fields[1] + "\n" +fields[6]   
Casimir et Hippolyte
  • 83,228
  • 5
  • 85
  • 113
0

A very strict version would look something like this:

^\*\s+\d{3}\s+(\d{4}(?:\.[0-9a-f]{4}){2})\s+\w+\s+\d+\s+\w\s+\w\s+([0-9A-Za-z]+)$

Regular expression visualization

Debuggex Demo

Here I assume that:

  • the columns will be pretty much the same,
  • your first match group contains a group of decimal digits and two groups of lower-case hex digits,
  • and the last word can be anything.

A few notes:

  • \d+ is equivalent to \d{1,} or [0-9]{1,}, but reads better (imo)
  • use \. to match a literal ., as . would simply match anything
  • [a-z]{2} is equivalent to [a-z][a-z], but reads better (my opinion, again)
  • however, you might want to use \w instead to match a word character
Attila O.
  • 13,553
  • 9
  • 51
  • 82
0

This will do it:

reobj = re.compile(r"^.*?([\w]{4}\.[\w]{4}\.[\w]{4}).*?([\w]+)$", re.IGNORECASE | re.MULTILINE)
match = reobj.search(subject)
if match:
    group1 = match.group(1)
    group2 = match.group(2)
else:
    result = ""
Pedro Lobito
  • 75,541
  • 25
  • 200
  • 222