Capture between pattern of digits

Question

I'm stuck trying to capture a structure like this:

1:1 wefeff qwefejä qwefjk
dfjdf 10:2 jdskjdksdjö
12:1 qwe qwe: qwertyå

I would want to match everything between the digits, followed by a colon, followed by another set of digits. So the expected output would be:

match 1 = 1:1 wefeff qwefejä qwefjk dfjdf
match 2 = 10:2 jdskjdksdjö
match 3 = 12:1 qwe qwe: qwertyå

Here's what I have tried:

\d+\:\d+.+

But that fails if there are word characters spanning two lines.

I'm using a javascript based regex engine.

The `.` is any character except new lines, unless the `s` modifier is set. — chris85, Feb 22 '17 at 19:07
@Wiktor Stribiżew That did work! Do you want to make an answer out of it? A brief explanation would be greatly appreciated if you have the time! — Antti, Feb 22 '17 at 19:14
@Antti: I posted the regex solution adjust for JavaScript since the tag was added after my initial comment. I also added an unrolled regex version that will work regardless of modifiers used with the regex. — Wiktor Stribiżew, Feb 22 '17 at 20:54

score 1 · Accepted Answer · edited May 23 '17 at 11:46

You may use a regex based on a tempered greedy token:

/\d+:\d+(?:(?!\d+:\d)[\s\S])*/g

The \d+:\d+ part will match one or more digits, a colon, one or more digits and (?:(?!\d+:\d)[\s\S])* will match any char, zero or more occurrences, that do not start a sequence of one or more digits followed with a colon and a digit. See this regex demo.

As the tempered greedy token is a resource consuming construct, you can unroll it into a more efficient pattern like

/\d+:\d+\D*(?:\d(?!\d*:\d)\D*)*/g

See another regex demo.

Now, the () is turned into a pattern that matches strings linearly:

\D* - 0+ non-digit symbols
(?: - start of a non-capturing group matching zero or more sequences of:
- \d - a digit that is...
- (?!\d*:\d) - not followed with 0+ digits, : and a digit
- \D* - 0+ non-digit symbols
)* - end of the non-capturing group.

Felipe Quirós · Answer 2 · 2017-02-22T20:02:19.943

0

you can use or not the ñ-Ñ, but you should be ok this way

\d+?:\d+? [a-zñA-ZÑ ]*

Edited:

If you want to include the break lines, you can add the \n or \r to the set,

\d+?:\d+? [a-zñA-ZÑ\n ]*
\d+?:\d+? [a-zñA-ZÑ\r ]*

Give it a try ! also tested in https://regex101.com/

for more chars: ^[a-zA-Z0-9!@#\$%\^\&*)(+=._-]+$

edited Feb 22 '17 at 20:02

answered Feb 22 '17 at 19:30

Felipe Quirós

377
3
14

Doesnt' capture "dfjdf" – Antti Feb 22 '17 at 19:34
Now the problem is that it stop at special characters in the text like : and ä, I'll edit my question to contain some special characters as well – Antti Feb 22 '17 at 19:54
you have the list of the special chars tha could be there ? when you have to add those to a regex, usually you use a set, this is the most accurate, but "short" used set of chars: ^[a-zA-Z0-9!@#\$%\^\&*\)\(+=._-]+$ – Felipe Quirós Feb 22 '17 at 20:02

Capture between pattern of digits

2 Answers2