632

My regex pattern looks something like

<xxxx location="file path/level1/level2" xxxx some="xxx">

I am only interested in the part in quotes assigned to location. Shouldn't it be as easy as below without the greedy switch?

/.*location="(.*)".*/

Does not seem to work.

random
  • 9,324
  • 10
  • 63
  • 77
publicRavi
  • 2,387
  • 8
  • 26
  • 34
  • What's your source, is it HTML or xml or something? – Oskar Kjellin Mar 23 '10 at 20:39
  • 26
    Why is this a community wiki? It's a real question. Too late now. – Ahmad Mageed Mar 23 '10 at 20:41
  • 1
    What language are you writing in? Please don't use regex for XML. There are so many better ways to parse XML – Oskar Kjellin Mar 23 '10 at 20:42
  • 3
    Not if all you want is to scan for simple attributes. Regex is appropriate and faster. – codenheim Mar 23 '10 at 20:44
  • I would say that if you for example code c# it is so much better to use linq for this. I doubt that it will be better to regex if you have a good parser – Oskar Kjellin Mar 23 '10 at 20:45
  • Well, the source is an XML file, but I grep particular tags into a text file. For my purposes, this regex will probably suffice. – publicRavi Mar 23 '10 at 21:02
  • Thanks for your answers; Daniel V gets my vote on a FIFO basis :) – publicRavi Mar 23 '10 at 21:04
  • @WiktorStribiżew there is a meta post [here](https://meta.stackoverflow.com/questions/387791) that wonders if the duplicate should be reversed as the argument is that the answers here are better then on the duplicate. I can't judge it. Just a heads-up. – rene Jul 28 '19 at 08:17

9 Answers9

1272

You need to make your regular expression non-greedy, because by default, "(.*)" will match all of "file path/level1/level2" xxx some="xxx".

Instead you can make your dot-star non-greedy, which will make it match as few characters as possible:

/location="(.*?)"/

Adding a ? on a quantifier (?, * or +) makes it non-greedy.

Daniel Vandersluis
  • 83,484
  • 18
  • 156
  • 151
  • 36
    FWIW, incase your using VIM, this regex needs to be a little different: instead of `.*?` it's `.\{-}` for a non-greedy match. – SooDesuNe Mar 24 '11 at 00:21
  • 53
    Thanks Daniel. **"Adding a ? on a quantifier (?, * or +) makes it non-greedy."** is helpful tip for me. – PhatHV Aug 20 '14 at 02:30
  • 13
    The ? describes my confusion in trying to figure this out. How appropriate. – Robbie Smith Apr 18 '16 at 17:38
  • 2
    I believe you can say 'lazy' instead of 'non-greedy' – Manticore Oct 19 '16 at 20:15
  • Because the question dnesn't specify a particular regex dialect, this answer should spell out that it's only available in regex engines which implement the Perl 5 extensions (Java, Ruby, Python, etc) but not in "traditional" regex engines (including JavaScript, Awk, `sed`, `grep` without `-P`, etc). – tripleee Jul 08 '20 at 17:52
59

location="(.*)" will match from the " after location= until the " after some="xxx unless you make it non-greedy. So you either need .*? (i.e. make it non-greedy) or better replace .* with [^"]*.

sepp2k
  • 341,501
  • 49
  • 643
  • 658
34

How about

.*location="([^"]*)".*

This avoids the unlimited search with .* and will match exactly to the first quote.

One Man Crew
  • 8,885
  • 2
  • 37
  • 50
user193690
  • 1
  • 2
  • 2
  • 1
    Due to [discrepancies in grep](https://stackoverflow.com/questions/23454172/non-greedy-matching-with-grep) the above should be the preferred pattern if portability is a concern. – Josh Habdas Aug 13 '18 at 05:44
27

Use non-greedy matching, if your engine supports it. Add the ? inside the capture.

/location="(.*?)"/
codenheim
  • 19,092
  • 1
  • 51
  • 77
15

Use of Lazy quantifiers ? with no global flag is the answer.

Eg,

enter image description here

If you had global flag /g then, it would have matched all the lowest length matches as below. enter image description here

Uddhav Gautam
  • 6,052
  • 3
  • 39
  • 54
2

Because you are using quantified subpattern and as descried in Perl Doc,

By default, a quantified subpattern is "greedy", that is, it will match as many times as possible (given a particular starting location) while still allowing the rest of the pattern to match. If you want it to match the minimum number of times possible, follow the quantifier with a "?" . Note that the meanings don't change, just the "greediness":

*?        //Match 0 or more times, not greedily (minimum matches)
+?        //Match 1 or more times, not greedily

Thus, to allow your quantified pattern to make minimum match, follow it by ? :

/location="(.*?)"/
Mohammad Kanan
  • 3,886
  • 10
  • 17
  • 37
2

Here's another way.

Here's the one you want. This is lazy [\s\S]*?

The first item: [\s\S]*?(?:location="[^"]*")[\s\S]* Replace with: $1

Explaination: https://regex101.com/r/ZcqcUm/2


For completeness, this gets the last one. This is greedy [\s\S]*

The last item:[\s\S]*(?:location="([^"]*)")[\s\S]* Replace with: $1

Explaination: https://regex101.com/r/LXSPDp/3


There's only 1 difference between these two regular expressions and that is the ?

Ste
  • 752
  • 1
  • 5
  • 18
1

The other answers here fail to spell out a full solution for regex versions which don't support non-greedy matching. The greedy quantifiers (.*?, .+? etc) are a Perl 5 extension which isn't supported in traditional regular expressions.

If your stopping condition is a single character, the solution is easy; instead of

a(.*?)b

you can match

a[^ab]*b

i.e specify a character class which excludes the starting and ending delimiiters.

In the more general case, you can painstakingly construct an expression like

start(|[^e]|e(|[^n]|n(|[^d])))end

to capture a match between start and the first occurrence of end. Notice how the subexpression with nested parentheses spells out a number of alternatives which between them allow e only if it isn't followed by nd and so forth, and also take care to cover the empty string as one alternative which doesn't match whatever is disallowed at that particular point.

Of course, the correct approach in most cases is to use a proper parser for the format you are trying to parse, but sometimes, maybe one isn't available, or maybe the specialized tool you are using is insisting on a regular expression and nothing else.

tripleee
  • 139,311
  • 24
  • 207
  • 268
0
import regex
text = 'ask her to call Mary back when she comes back'                           
p = r'(?i)(?s)call(.*?)back'
for match in regex.finditer(p, str(text)):
    print (match.group(1))

Output: Mary