0

first, I know there are similar questions but I could not find anything that really matched my case.

I have a large string from which I want to cut out some texts.

import re
largeString = 'abcdefgTHIS NEEDS TO GO?abdehdfenTHIS NEEDS TO GO TOO?asjdhnasjdf'
itemList =['THIS NEEDS TO GO?','THIS NEEDS TO GO TOO?']
<<<some code>>>
Out: 'abcdefgabdehdfenasjdhnasjdf

I tried this:

for i in itemList:
  largeString = re.sub(str(i), '', largeString.rstrip())

The Problem is the question mark in the search strings- I know that usually I would do 'THIS NEEDS TO GO\?'. Unfortunately, the list is very large and I cannot change it manually.

Is there a way to make regex ignore the question mark feature?

Any help or idea is appreciated!

Thanks!

Chris_589
  • 47
  • 7
  • Does this answer your question? [Reference - What does this regex mean?](https://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean) – rsjaffe Apr 22 '20 at 08:03
  • 1
    Your example is really simple and is something a simple `replace()` would also achieve and doesn't have the problem with the question mark `?`. You are also applying the regex inefficiently if you are storing a bunch of patterns that is then turned into nothing. And by "disallowing" the question mark it is not a valid regex pattern. You should implement regex properly or look for something like `replace` if your use case is really that simple. – Tin Nguyen Apr 22 '20 at 08:07
  • No, in reality the use case is not as simple of course.. I already got the help I needed in one of the answers. Thanks for your suggestion, will have a look into the replace function as well! – Chris_589 Apr 22 '20 at 08:10

1 Answers1

1

If your strings inside of itemList only contain strings, (don't contain any regex) then you can use re.escape on it before giving it to any search-parameter. As the name suggests, it will then go over the string, and escape all of the characters that have special meaning in regex.

import re
largeString = 'abcdefgTHIS NEEDS TO GO?abdehdfenTHIS NEEDS TO GO TOO?asjdhnasjdf'
itemList =['THIS NEEDS TO GO?','THIS NEEDS TO GO TOO?']
for item in itemList:
  largeString = re.sub(re.escape(item), '', largeString)

Output:

>>> largeString
'abcdefgabdehdfenasjdhnasjdf'
Hampus Larsson
  • 2,099
  • 2
  • 10
  • 15
  • Yes that works! Great, I didn't know the escape expression. Thanks a lot! – Chris_589 Apr 22 '20 at 08:08
  • 1
    As your edit says this would no longer be regex. The question mark meaning in regex would no longer work but so would many other things. A proper approach to this would to use `str.replace()`. – Tin Nguyen Apr 22 '20 at 08:10
  • @TinNguyen absolutely, that would be a better solution for this specific problem. However, if they ever need to use this in conjunction with a regular-expression, they could just concatenate the regex to the `re.escape` sequence. – Hampus Larsson Apr 22 '20 at 08:24