0

I am trying to build a combined regular expression, but I don't know how to combine the two sub expressions

  • I have an input string like this: 4711_001.doc
  • In want to match the following: 4711.doc
  • I am able to match 4711 with this expression: [^\_\.]*
  • I am able to match .prt with this exression: \.[^.]+

Is there some kind of logical AND to combine the two expressions and match 4711.doc? How would the expression look like?

Wai Ha Lee
  • 7,664
  • 52
  • 54
  • 80

3 Answers3

3

You can use groups to do it in one regular expression. Check out this code for reference:

import re
s = "4711_001.doc"
match = re.search(r"(.+?)_\d+(\..+)", s)
print(match.group(1) + match.group(2))

Output:

4711.doc
Aniket Tiratkar
  • 746
  • 4
  • 14
1

Another possibility would be to match the part you don't want:

_\d+

And replace this with "":

import re
s = "4711_001.doc"
match = re.sub(r"_\d+", "", s)
print(match)

See the online demo

JvdV
  • 41,931
  • 5
  • 24
  • 46
1

For this example string 4711_001.doc, using [^_.]* and \.[^.]+ is quite a broad match as it can match any character except what is listed in the character class.

Perhaps you could make the pattern a bit more specific, matching digits at the start and word characters as the extension.

In the replacement use capture group 1 and 2, often denoted as $1$2 or \1\2

(\d+)_\d+(\.\w+)

Regex demo

There is no language tagged, but if for example \K is supported to clear the match buffer this might also be an option (including the parts that you tried)

In the replacement use an empty string.

[^_.]*\K_[^._]+(?=\.[^.]+$)

In parts

  • [^_.]*\K Match the part before the underscore, then forget what is matched so far
  • _[^._]+ Match the underscore, follwed by 1+ chars other than . and _
  • (?=\.[^.]+$) A positive lookahead assertion to make sure what is at the right is a . followed by any char other than a . until the end of the string.

Regex demo

The fourth bird
  • 96,715
  • 14
  • 35
  • 52