1

I have a text from which I want to extract text that occurs betweeen two match strings. But when I am trying to extract it using regex instead of the all the matches of that range it is extracting the last one.

Here is what I have tried

string = "\n3.5\nFerguson to Gayle, FOUR, short of length delivery on the stump, 115.9km/h, Gayle clears that leg and clubs it off the toe-end just over mid-on to the long-on fence for four\n3.4\nFerguson to Gayle, no run, yorker length delivery, inch-perfect, 148.2km/h, digs it out off the toe-end\n3.3\nFerguson to Gayle, no run, slower delivery on a good length heading for middle, Gayle delays his defence from the crease back down the wicket\n3.3\nFerguson to Gayle, wide, 148.1km/h, wide down leg, swings away after the ball crosses the batsman's missed flick and swerves to the left of the 'keeper who has done well to collect it\n3.2\n"
re.findall("3.4(.*)3.3", string,  re.DOTALL)

What I want is all the matches i.e from \n3.4\nFerguson towicket\n3.3 and the other \n3.4\nFerguson to toe-end\n3.3 i.e both the occurence. However my code only gives me the largest one. Is there any way to do it? Any help would be highly appreciated.

Note : Please understand that what I want is all possible matched pattern however adding ? gives only the first one.

Himanshu Poddar
  • 2,759
  • 2
  • 23
  • 52
GBDGBDA
  • 113
  • 1
  • 16
  • You need to use non-greedy regex so it captures as less as possible instead of as much as possible. Use `3.4(.*?)3.3` instead of `3.4(.*)3.3` – Pushpesh Kumar Rajwanshi Apr 12 '19 at 15:58
  • But I am asking for all possible patterns, I already knew that – GBDGBDA Apr 12 '19 at 16:00
  • Do you want to match the content from `3.4` to `3.3` and then the next content from `3.3` to `3.3`? – Pushpesh Kumar Rajwanshi Apr 12 '19 at 16:05
  • no please read the question i have mentioned it - ```\n3.4\nFerguson``` to ```wicket\n3.3``` and the other ```\n3.4\nFerguson``` to ```toe-end\n3.3``` – GBDGBDA Apr 12 '19 at 16:07
  • You need to clearly mention the current output vs actual output in your post, otherwise your post will end up being closed like this one. What actual output you are expecting is not clear from your post. – Pushpesh Kumar Rajwanshi Apr 12 '19 at 16:11
  • @PushpeshKumarRajwanshi will keep in mind the next time – GBDGBDA Apr 12 '19 at 16:15
  • @Wiktor: I don't think he wants overlapping matches. Instead he wants matches that will be generated by greedy as well as non-greedy regex. – Pushpesh Kumar Rajwanshi Apr 12 '19 at 16:17
  • See [this demo](https://ideone.com/rq0IYe). The point is that you cannot match several times at the same location. If there is a chance of only one continuation, you may also use something like `re.findall(r'(?s)((3\.4.*?3\.3)(?:.*?3\.3)?)', string)` (see [this demo](https://ideone.com/2WkZMD)). – Wiktor Stribiżew Apr 12 '19 at 16:18
  • @PushpeshKumarRajwanshi It is not about usual overlapping matches, but about those overlapping matches that start at the same location in the string. – Wiktor Stribiżew Apr 12 '19 at 16:19

0 Answers0