Combine two regular expressions with a logical "and" operator

Question

I am trying to build a combined regular expression, but I don't know how to combine the two sub expressions

I have an input string like this: 4711_001.doc
In want to match the following: 4711.doc
I am able to match 4711 with this expression: [^\_\.]*
I am able to match .prt with this exression: \.[^.]+

Is there some kind of logical AND to combine the two expressions and match 4711.doc? How would the expression look like?

Replace `^([^_.]+).*(\.[^.]+)$` with `$1$2`, see https://regex101.com/r/56A0YS/1/ — Wiktor Stribiżew, Nov 13 '20 at 13:31
Not sure why everyone thinks the question is about Python :) What is your coding environment? — Wiktor Stribiżew, Nov 13 '20 at 14:15
@WiktorStribiżew I didn't realize that the question is not about Python till I read your comment. I hope Stefan J. understands Python. — Aniket Tiratkar, Nov 13 '20 at 14:49

Aniket Tiratkar · Answer 1 · 2020-11-13T13:42:06.787

3

You can use groups to do it in one regular expression. Check out this code for reference:

import re
s = "4711_001.doc"
match = re.search(r"(.+?)_\d+(\..+)", s)
print(match.group(1) + match.group(2))

Output:

4711.doc

edited Nov 13 '20 at 13:42

answered Nov 13 '20 at 13:36

Aniket Tiratkar

746
4
14

score 1 · Answer 2 · answered Nov 13 '20 at 14:05

1

Another possibility would be to match the part you don't want:

_\d+

And replace this with "":

import re
s = "4711_001.doc"
match = re.sub(r"_\d+", "", s)
print(match)

See the online demo

answered Nov 13 '20 at 14:05

JvdV

41,931
5
24
46

score 1 · Answer 3 · answered Nov 13 '20 at 15:53

For this example string 4711_001.doc, using [^_.]* and \.[^.]+ is quite a broad match as it can match any character except what is listed in the character class.

Perhaps you could make the pattern a bit more specific, matching digits at the start and word characters as the extension.

In the replacement use capture group 1 and 2, often denoted as $1$2 or \1\2

(\d+)_\d+(\.\w+)

Regex demo

There is no language tagged, but if for example \K is supported to clear the match buffer this might also be an option (including the parts that you tried)

In the replacement use an empty string.

[^_.]*\K_[^._]+(?=\.[^.]+$)

In parts

[^_.]*\K Match the part before the underscore, then forget what is matched so far
_[^._]+ Match the underscore, follwed by 1+ chars other than . and _
(?=\.[^.]+$) A positive lookahead assertion to make sure what is at the right is a . followed by any char other than a . until the end of the string.

Regex demo

Combine two regular expressions with a logical "and" operator

3 Answers3