1

I'm stuck trying to capture a structure like this:

1:1 wefeff qwefejä qwefjk
dfjdf 10:2 jdskjdksdjö
12:1 qwe qwe: qwertyå

I would want to match everything between the digits, followed by a colon, followed by another set of digits. So the expected output would be:

match 1 = 1:1 wefeff qwefejä qwefjk dfjdf
match 2 = 10:2 jdskjdksdjö
match 3 = 12:1 qwe qwe: qwertyå

Here's what I have tried:

\d+\:\d+.+

But that fails if there are word characters spanning two lines.

I'm using a javascript based regex engine.

Antti
  • 313
  • 3
  • 11
  • 1
    Something like `(?s)\d+:\d+(?:(?!\d+:\d).)*` should work. – Wiktor Stribiżew Feb 22 '17 at 19:06
  • The `.` is any character except new lines, unless the `s` modifier is set. – chris85 Feb 22 '17 at 19:07
  • @Wiktor Stribiżew That did work! Do you want to make an answer out of it? A brief explanation would be greatly appreciated if you have the time! – Antti Feb 22 '17 at 19:14
  • @Antti: I posted the regex solution adjust for JavaScript since the tag was added after my initial comment. I also added an unrolled regex version that will work regardless of modifiers used with the regex. – Wiktor Stribiżew Feb 22 '17 at 20:54

2 Answers2

1

You may use a regex based on a tempered greedy token:

/\d+:\d+(?:(?!\d+:\d)[\s\S])*/g

The \d+:\d+ part will match one or more digits, a colon, one or more digits and (?:(?!\d+:\d)[\s\S])* will match any char, zero or more occurrences, that do not start a sequence of one or more digits followed with a colon and a digit. See this regex demo.

As the tempered greedy token is a resource consuming construct, you can unroll it into a more efficient pattern like

/\d+:\d+\D*(?:\d(?!\d*:\d)\D*)*/g

See another regex demo.

Now, the () is turned into a pattern that matches strings linearly:

  • \D* - 0+ non-digit symbols
  • (?: - start of a non-capturing group matching zero or more sequences of:
    • \d - a digit that is...
    • (?!\d*:\d) - not followed with 0+ digits, : and a digit
    • \D* - 0+ non-digit symbols
  • )* - end of the non-capturing group.
Community
  • 1
  • 1
Wiktor Stribiżew
  • 484,719
  • 26
  • 302
  • 397
0

you can use or not the ñ-Ñ, but you should be ok this way

\d+?:\d+? [a-zñA-ZÑ ]*

Edited:

If you want to include the break lines, you can add the \n or \r to the set,

\d+?:\d+? [a-zñA-ZÑ\n ]*
\d+?:\d+? [a-zñA-ZÑ\r ]* 

Give it a try ! also tested in https://regex101.com/

for more chars: ^[a-zA-Z0-9!@#\$%\^\&*)(+=._-]+$

Felipe Quirós
  • 377
  • 3
  • 14
  • Doesnt' capture "dfjdf" – Antti Feb 22 '17 at 19:34
  • Now the problem is that it stop at special characters in the text like : and ä, I'll edit my question to contain some special characters as well – Antti Feb 22 '17 at 19:54
  • you have the list of the special chars tha could be there ? when you have to add those to a regex, usually you use a set, this is the most accurate, but "short" used set of chars: ^[a-zA-Z0-9!@#\$%\^\&*\)\(+=._-]+$ – Felipe Quirós Feb 22 '17 at 20:02