0

I want to get the file extension in Groovy with a regex, for let's say South.6987556.Input.csv.cop.

http://www.regexplanet.com/advanced/java/index.html shows me that the second group would really contain the cop extension. Which is what I want.

0: [0,27] South.6987556.Input.csv.cop
1: [0,23] South.6987556.Input.csv
2: [24,27] cop

I just don't understand why the result won't be

0: [0,27] South.6987556.Input.csv.cop
1: [0,23] South
2: [24,27] 6987556.Input.csv.cop

What should be the regex to get this kind of result?

dda
  • 5,700
  • 2
  • 23
  • 33
Radek
  • 14,395
  • 48
  • 147
  • 231
  • 2
    The reason is that the (.*) is "greedy" - it will gobble up as many characters as possible. To make it not-greedy, add a question mark. `(.*?)\.(.*)`. – Alec Jul 28 '14 at 06:08
  • @AvinashRaj It's in the title. – Alec Jul 28 '14 at 06:09
  • 2
    Does the last `?` belongs to the regex or not? Why don't you play with [regex101](http://regex101.com/r/sQ0kW4/1)? – Avinash Raj Jul 28 '14 at 06:09
  • Please see http://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean ... [You also probably want `(.*?)\.(.*)`](http://stackoverflow.com/questions/24989252/could-someone-explain-reg-exp/24989310#comment38850267_24989252) – Unihedron Jul 28 '14 at 06:11
  • @AvinashRaj: no, ? is not part of the reg exp. I updated the title – Radek Jul 28 '14 at 06:13
  • @Alec: could you make an answer from your comment so I can except it? – Radek Jul 28 '14 at 06:16

2 Answers2

2

To get the desired output, your regex should be:

((.*?)\.(.*))

DEMO

See the captured groups at right bottom of the DEMO site.

Explanation:

(         group and capture to \1:
  (       group and capture to \2:
    .*?   any character except \n (0 or more
          times) ? after *  makes the regex engine
          to does a non-greedy match(shortest possible match).
  )       end of \2
  \.      '.'
  (       group and capture to \3:
    .*    any character except \n (0 or more
          times)
  )       end of \3
)         end of \1
dda
  • 5,700
  • 2
  • 23
  • 33
Avinash Raj
  • 160,498
  • 22
  • 182
  • 229
2

Here is a visualization of this regex

(.*)\.(.*)

Regular expression visualization

Debuggex Demo

in words

  • (.*) matches anything als large as possible and references it
  • \. matches one period, no reference (no brackets)
  • (.*) matches anything again, may be empty, and references it

in your case this is

  • (.*) : South.6987556.Input.csv
  • \. : .
  • (.*) : cop

it isn't just only South and 6987556.Input.csv.cop because the first part (.*) isn't optional but greedy and must be followed by a period, so the engine tries to match the largest possible string.

Your intended result would be created by this regex: (.*?)\.(.*). The ? after a quantifier (in this case *) switches the behaviour of the engine to ungreedy, so the smallest matching string will be searched. By default most regex engines are greedy.

bukart
  • 4,776
  • 2
  • 18
  • 40