1

I am new to python and was going through "Google for Education" python course

Now, the line below confuses me:

* -- 0 or more occurrences of the pattern to its left

(all the examples are in python3)

e.g. 1

In [1]: re.search(r"pi*", "piiig!!").group()
Out[1]: 'piii'

This is fine since, "pi" has 1 occurrance so it is retured

e.g. 2

In [2]: re.search(r"i*", "piiig!!").group()
Out[2]: ''

Why does it not return "i" in fact - from my understanding, it should be returning "iii". But the result is an empty string.

Also, What exactly does "0 or more" mean? I searched on google but everywhere it is mentioned * -- 0 or more. But if there is 0 occurrence of an expression, does that not become true even if it's not there? What is the point of searching then?

I am so confused with this. Can you please help me with explaining this or point me in the right direction.

i hope the right explanation would also resolve my this issue:

In [3]: re.search(r"i?", "piiig!!").group()
Out[3]: ''

I have tried the examples in Spyder 3.2.4

wp78de
  • 16,078
  • 6
  • 34
  • 56
  • 1
    The point of "0 or more" is mostly because you can concatenate regular expressions. So `pi*g` matches `pg` or `pig` or `piiiig` but not `peg` or `poig`. – Jesin Nov 16 '17 at 05:41
  • There's a regexp tutorial at regular-expression.info. I suggest you go through it. – Barmar Nov 16 '17 at 07:00
  • @Wiktor Stribiżew I think this question is in the sense of talking about the specific behavior of `re.search()` with a `*` not a duplicate. But the title has to be changed. – wp78de Nov 16 '17 at 07:14

4 Answers4

0

You need to use *(0 or more) and +(1 or more) properly to get your desired output

Eg: 1 Matches because you have defined * only for "i", this patter will capture all the "p" or "pi" combination

Eg: 2 If you need to match only "i" you need to use "+" instead of "*".

If you use "*"

In: re.search(r"pi*g", "piiig!!").group()

This will return if you input is ("pig" or "piig" or "pg")

If you use "+"

In: re.search(r"pi+g", "piiig!!").group()

This will return if you input is ("pig" or "piig")

Maliq
  • 135
  • 2
  • 9
0

The special charecter * means 0 or more occurrence of the preceding character. For eg. a* matches with 0 or more occurrence of a which could be '', 'a', 'aa' etc. This happens because '' has 0 occurrence of a. To get iii you should have used + instead of * and thus would have got the first non zero sequence of 'i' which is iii

re.search("i+", "piiig!!").group()
Barmar
  • 596,455
  • 48
  • 393
  • 495
  • i know what "+" operator does. "*" is 0 or more so what is the significance of using it, that was my question. –  Nov 18 '17 at 03:52
0

Because '' is the first matched result of r'i*' and 'iii' is the second matched result.

In [1]: import re

In [2]: re.findall(r'i*', 'piiig!!')
Out[2]: ['', 'iii', '', '', '', '']

This website will also explain the way how regular expression work. https://regex101.com/r/XVPXMv/1

Calvin Wu
  • 142
  • 10
0

The explanation is a bit more complicated than the answers we have seen so far.

First, unlike re.match() the primitive operation re.search() checks for a match anywhere in the string (this is what Perl does by default) and finds the pattern once:

Scan through string looking for the first location where the regular expression pattern produces a match, and return a corresponding MatchObject instance. Return None if no position in the string matches the pattern; note that this is different from finding a zero-length match at some point in the string. See: Ref.

If we follow every step of the regex engine while it tries to find a match, we can observe the following for the pattern i* and the test string piigg!!:

RegExBuddy Debug till End Output

As you can see, the first character (at position 0) produces a match because p is zero times i and the result is an empty match (and not p - because we do not search for p or any other character).
At the second character (position 1) the second match (spanning to position 2) is found since ii is zero or more times i... at position 3 there is another empty match, and so far and so forth.

Because re.search only returns the first match it sticks with the first empty match at position 0. That's why you get the (confusing) result you have posted:

In [2]: re.search(r"i*", "piiig!!").group()
Out[2]: ''

In order to match every occurrence, you need re.findall():

Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match. See: Ref.

wp78de
  • 16,078
  • 6
  • 34
  • 56
  • Thank you for such a good explanation. This is exactly what i was looking for. The other answers are just stating what is written in the course document as is but you answer explains the how and why. –  Nov 18 '17 at 03:59
  • 1
    also, https://regex101.com/ is a great place to test out and understand regex. ty –  Nov 18 '17 at 04:11