0

I have the following string:

1- Baby Carrots (4Kids) (3 DOLLARS) [EXTRA 0 COUNT]; [REQUIRED 5 COUNT]

I am trying to get the following groups:

Item - 1
Food - Baby Carrots (4Kids) (3 DOLLARS)
Cost - 3
Extra - 0
required - 5

The following is my current match string that is not picking up anything:

'(?P<item>.+?)\-(?P<food>.*)\[.*?(?P<extra>\d+(\.\d+)?).*\].*\[.*?(?P<required>\d+(\.\d+)?).*\]'

What is wrong with my attempt?

Rolando
  • 44,077
  • 84
  • 230
  • 353
  • possible duplicate of [Reference - What does this regex mean?](http://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean) – Andy Apr 11 '14 at 14:48
  • problem is '.+?' would consume complete string. – Grijesh Chauhan Apr 11 '14 at 14:52
  • You are very close. Where do you attempt to capture the cost? I can't see it in your regex. That is, running your regex I get everything fine except the cost string is merged in with the food string. – anon582847382 Apr 11 '14 at 14:55
  • You just edited the name to include (3 DOLLARS). So you want (3 DOLLARS) as part of the food name *and* you want the 3 separately--and redundantly--captured as the cost? – aliteralmind Apr 11 '14 at 16:12

3 Answers3

1

Your original regex:

(?P<item>.+?)\-(?P<food>.*)\[.*?(?P<extra>\d+(\.\d+)?).*\].*\[.*?(?P<required>\d+(\.\d+)?).*\]

Regular expression visualization

Debuggex Demo

Your problems are mostly due to the fact that you are searching for any character, instead of specific ones (digits and static strings). For example: Why do you use

(?P<item>.+?)

if it's only going to be numbers? Change it to

(?P<item>[0-9]+?)

and the '+?':reluctant operator is not necessary in this case, since you always want the entire number. That is, the next portion of the match will not be in the middle of that number.

In addition, this should be anchored to line (input) start:

^(?P<item>[0-9]+?)

You don't need to escape the dash (although it doesn't hurt).

^(?P<item>[0-9]+?)-

Your food group (heh) is the most complicated part

(?P<food>.*)

It doesn't just contain any character. Based on your demo input, it only has letters, spaces, numbers, and parens. So search just for them:

(?P<food>[\w0-9 ()]+)

Here's what we have so far:

^(?P<item>[0-9]+?)- (?P<food>[\w0-9 ()]+)

Regular expression visualization

Debuggex Demo

You'll see that this also matches the cost part (which is completely missing from your regex...I assume that's just an oversight).

So add the cost, which is

  • (
  • a number
  • [space]DOLLARS)

But only capture the number:

^(?P<item>[0-9]+?)- (?P<food>[\w0-9 ()]+) \((?P<cost>[0-9]+) DOLLARS\)

The rest of your regex works fine, actually, and it can be added to the end as is:

\[.*?(?P<extra>\d+(\.\d+)?).*\].*\[.*?(?P<required>\d+(\.\d+)?).*\]

I'd recommend, however, changing the .*? to EXTRA[space] if indeed that text is always found there (and again, no need for reluctance in this case). Same with [space]COUNT, ; and REQUIRED[space]. The more you narrow things down, the easier your regex will be to debug--assuming your input is indeed that restricted.

Here's the final version (with an end-of-line anchor as well):

^(?P<item>[0-9]+?)- (?P<food>[\w0-9 ()]+) \((?P<cost>[0-9]+) DOLLARS\) \[EXTRA (?P<extra>\d+(\.\d+)?) COUNT\]; \[REQUIRED (?P<required>\d+(\.\d+)?) COUNT\]$

Regular expression visualization

Debuggex Demo


Before analyzing your regex, this is what I came up with:

(?P<item>[0-9]+)- (?P<food>[\w ()]+) \((?P<cost>[0-9]+) DOLLARS\) \[EXTRA (?P<extra>[0-9]+) COUNT\]; \[REQUIRED (?P<required>[0-9]+) COUNT\]

Regular expression visualization

Debuggex Demo


All these links came from the Stack Overflow Regular Expressions FAQ.

Community
  • 1
  • 1
aliteralmind
  • 18,274
  • 16
  • 66
  • 102
0

like this :

(?P<item>.+?)\-\s(?P<food>.*?\)).*?\((?P<cost>\d)\s\w+\)\s\[.*?(?P<extra>\d+(\.\d+)?).*\].*\[.*?(?P<required>\d+(\.\d+)?).*\]

demo here : http://regex101.com/r/qD1rL9

aelor
  • 9,803
  • 2
  • 27
  • 43
0

As mentioned above, you are missing a capture for cost, you also need to make the food capture non-greedy and include the closing paren. My version:

(?P<Item>\d)-\s*(?P<Food>.*?\))\s*\((?P<Cost>\d*).*EXTRA\s*(?P<Extra>\d*).*REQUIRED\s*(?P<Required>\d*)

{'Food': 'Baby Carrots (4Kids)', 'Item': '1', 'Required': '5', 'Extra': '0', 'Cost': '3'}

Seems a bit faster using http://www.pythonregex.com/

wwii
  • 19,802
  • 6
  • 32
  • 69