1

Suppose a string:

s = 'F3·Compute·Introduction to Methematical Thinking.pdf'

I substitute F3·Compute· with '' using regex

In [23]: re.sub(r'F3?Compute?', '',s)
Out[23]: 'F3·Compute·Introduction to Methematical Thinking.pdf'

It failed to work as I intented

When tried,

In [21]: re.sub(r'F3·Compute·', '', 'F3·Compute·Introduction to Methematical Thinking.pdf')
Out[21]: 'Introduction to Methematical Thinking.pdf'

What's the problem with my regex pattern?

2 Answers2

-1

Use dot to match any single character:

#coding: utf-8
import re

s = 'F3·Compute·Introduction to Methematical Thinking.pdf'
output = re.sub(r'F3.Compute.', '', unicode(s,"utf-8"), flags=re.U)
print output

Your original pattern, 'F3?Compute? was not having the desired effect. This said to match F followed by the number 3 optionally. Also, you made the final e of Compute optional. In any case, you were not accounting for the separator characters.

Note also that we must match on the unicode version of the string, and not the string directly. Without doing this, a dot won't match the unicode separator which you are trying to target. Have a look at the demo below for more information.

Demo

Tim Biegeleisen
  • 387,723
  • 20
  • 200
  • 263
-1

The question mark ? does not stand in for a single character in regular expressions. It means 0 or 1 of the previous character, which in your case was 3 and e. Instead, the . is what you're looking for. It is a wildcard that stands for a single character (and has nothing to do with your middle-dot character; that is just coincidence).

re.sub(r'F3.Compute.', '',s)
slackwing
  • 25,894
  • 12
  • 72
  • 124