Regex help to match groups

Question

I am trying to write a regex for matching a text file that has multiple lines such as :

* 964      0050.56aa.3480    dynamic   200        F    F  Veth1379
* 930      0025.b52a.dd7e    static    0          F    F  Veth1469

My intention is to match the "0050.56aa.3480 " and "Veth1379" and put them in group(1) & group(2) for using later on.

The regex I wrote is :

\*\s*\d{1,}\s*(\d{1,}\.(?:[a-z][a-z]*[0-9]+[a-z0-9]*)\.\d{1,})\s*(?:[a-z][a-z]+)\s*\d{1,}\s*.\s*.\s*((?:[a-z][a-z]*[0-9]+[a-z0-9]*))

But it does not seem to be working when I test at: http://www.pythonregex.com/

Could someone point to any obvious error I am doing here.

Thanks, ~Newbie

pythex.org is more fun, it checks your pattern in real time. — Casimir et Hippolyte, Apr 15 '14 at 19:01
`(\d{1,}.(?:[a-z][a-z][0-9]+[a-z0-9]).\d{1,})` matches 1+ digits, any character, 2 letters, 1+ digits, 1 alphanumeric, any character, and 1+ digits. This doesn't match `0050.56aa.3480` or `0025.b52a.dd7e`. Can you define how we match this string? — Sam, Apr 15 '14 at 19:04
A lot of people have issues when they come here asking for regex with only an example. You need to describe the pattern to us in english. If you give us one example and say you need a regex to match it, it can be done with a very simple regex, but probably won't match your other cases, or match too many. — Cruncher, Apr 15 '14 at 19:05
@aliteralmind: it is because he doesn't use shortcuts `\d{1,}` => `\d+` and `[a-z][a-z]*` => `[a-z]+`, `(?:[a-z][a-z]+)` => `[a-z]{2,}` — Casimir et Hippolyte, Apr 15 '14 at 19:15

score 2 · Accepted Answer · edited May 23 '17 at 10:25

2

Try this:

^\* [0-9]{3} +([0-9]{4}.[0-9a-z]{4}.[0-9a-z]{4}).*(Veth[0-9]{4})$

Regular expression visualization

Debuggex Demo

The first part is in capture group one, the "Veth" code in capture group two.

Please consider bookmarking the Stack Overflow Regular Expressions FAQ for future reference. There's a list of online testers in the bottom section.

edited May 23 '17 at 10:25

Community

1
1

answered Apr 15 '14 at 19:05

aliteralmind

18,274
16
66
102

Thanks. I actually had to editwhat you gave me above but for most part you were correct. This is what I tested thus far and it works. "\*\s*([0-9]{3}\s*[0-9]{4}.[0-9a-z]{4}.[0-9a-z]{4}).*(Veth[0-9]{4})$" – user3325980 Apr 15 '14 at 19:17
I think you mean `^\*\s*([0-9]{3}\s*[0-9]{4}.[0-9a-z]{4}.[0-9a-z]{4}).*(Veth[0-9]{4})$`, but yes, that looks good. – aliteralmind Apr 15 '14 at 19:20
Hi, I also needed to match lines such as "* 4043 4c00.82f3.95e6 static 0 F F Po1309" & "* 4044 a493.4cc7.d559 dynamic 0 F F Eth1/1/17" my new regex is "\*\s*[0-9]{1,}\s*([0-9a-z]{4}.[0-9a-z]{4}.[0-9a-z]{4}).*([a-z].*)$" but I am having issues in group(2) matching when I test at http://www.pythonregex.com/ could you help me in checking what I am doing wrong. Thanks, Newbie – user3325980 Apr 15 '14 at 21:15
Consider asking a new question so you can get fresh help. I can't take a look now as I'm leaving, but I'll check in when I get back. Good luck. – aliteralmind Apr 15 '14 at 21:28
1

I finally got it working. As I am newbie here the forum is not allowing me to answer. but here is what I used "\*\s*[0-9]{1,}\s*([0-9a-z]{4}.[0-9a-z]{4}.[0-9a-z]{4}).*(Veth\d+|Eth.*|Po\d+)$" Thanks for all the great help – user3325980 Apr 15 '14 at 21:42

score 2 · Answer 2 · answered Apr 15 '14 at 19:09

2

I don't think you need a regex for this:

for line in open('myfile','r').readlines():
    fields = line.split( )
    print "\n" + fields[1] + "\n" +fields[6]

answered Apr 15 '14 at 19:09

Casimir et Hippolyte

83,228
5
85
113

If you pretend that the split function doesn't exist in Python, it makes my regex answer the *obvious* choice. – aliteralmind Apr 15 '14 at 19:11

Attila O. · Answer 3 · 2014-04-15T19:22:02.893

A very strict version would look something like this:

^\*\s+\d{3}\s+(\d{4}(?:\.[0-9a-f]{4}){2})\s+\w+\s+\d+\s+\w\s+\w\s+([0-9A-Za-z]+)$

Regular expression visualization

Debuggex Demo

Here I assume that:

the columns will be pretty much the same,
your first match group contains a group of decimal digits and two groups of lower-case hex digits,
and the last word can be anything.

A few notes:

\d+ is equivalent to \d{1,} or [0-9]{1,}, but reads better (imo)
use \. to match a literal ., as . would simply match anything
[a-z]{2} is equivalent to [a-z][a-z], but reads better (my opinion, again)
however, you might want to use \w instead to match a word character

score 0 · Answer 4 · answered Apr 15 '14 at 20:02

0

This will do it:

reobj = re.compile(r"^.*?([\w]{4}\.[\w]{4}\.[\w]{4}).*?([\w]+)$", re.IGNORECASE | re.MULTILINE)
match = reobj.search(subject)
if match:
    group1 = match.group(1)
    group2 = match.group(2)
else:
    result = ""

answered Apr 15 '14 at 20:02

Pedro Lobito

75,541
25
200
222

Regex help to match groups

4 Answers4

A few notes: