-1

Can some one please help me on this - Here I'm trying extract word from given sentence which contains G,ML,KG,L,ML,PCS along with numbers . I can able to match the string , but not sure how can I extract the comlpete word

for example my input is "This packet contains 250G Dates" and output should be 250G another example is "You paid for 2KG Apples" and output should be 2KG

in my regular expression I'm getting only match string not complete word :(

import re
val = 'FUJI ALUMN FOIL CAKE, 240G, CHCLTE'
key_vals = ['G','GM','KG','L','ML','PCS']
re.findall("\d+\.?\d*(\s|G|KG|GM|L|ML|PCS)\s?", val)

mnj
  • 1
  • 3

3 Answers3

0

Try using this Regex:

\d+\s*(G|KG|GM|L|ML|PCS)\s?

It matches every string which starts with at least one digit, is then followed by one the units. Between the digits and the units and behind the units there can also be whitespaces.

Adjust this like you want to :)

Marcin Orlowski
  • 67,279
  • 10
  • 112
  • 132
genius42
  • 243
  • 6
0

Use non-grouping parentheses (?:...) instead of the normal ones. Without grouping parentheses findall returns the string(s) which match the whole pattern.

Michael Butscher
  • 7,667
  • 3
  • 20
  • 24
0

This regex will not get you what you want:

re.findall("\d+\.?\d*(\s|G|KG|GM|L|ML|PCS)\s?", val)

Let's break it down:

  • \d+: one or more digits
  • \.?: a dot (optional, as indicated by the question mark)
  • \d*: one or more optional digits
  • (\s|G|KG|GM|L|ML|PCS): a group of alternatives, but whitespace is an option among others, it should be out of the group: what you probably want is allow optional whitespace between the number and the unit ie: 240G or 240 G
  • \s?: optional whitespace

A better expression for your purpose could be:

re.findall("\d+\s*(?:G|KG|GM|L|ML|PCS)", val)

That means: one or more digits, followed by optional whitespace and then either of these units: G|KG|GM|L|ML|PCS.

Note the presence of ?: to indicate a non-capturing group. Without it the expression would return G

Anonymous
  • 1,403
  • 1
  • 6
  • 5