-1

I want to know if there is some regular expression or something I could use that would remove instances where I have 3 letters followed by a number in a string?

For example, I have this corpus

c = [CLE2 - Single Smalls station 117,
HOU2 - mathdenn,
[SAT2] Pack Singles > Line 7 > Station 04 Kiosk Ticket - ZT410 Shipping Label Not Printing Correcly,
[HOU2] Multiple GW Stations Down in AFE2]

I would want to apply some function that will then return

c = [Single Smalls station 117
 - mathdenn,
 Pack Singles > Line 7 > Station 04 Kiosk Ticket - ZT410 Shipping Label Not Printing Correcly
 Multiple GW Stations Down in ]

Looking for neat pythonic ways of achieving this. I read through regular expressions online a little but I have not found a way to specify I want to remove instances where we have 3 characters followed by a number, so I wouldn't say this answers my question.

I tried doing something like this:

regex = re.compile('[a-z][0-9]')
regex.findall(corpus[0])

But this just returns instances where we have a char followed by a number. Perhaps some type of modification of this?

Wolfy
  • 393
  • 1
  • 5
  • 20
  • 2
    Apparently you don't know where to start with your regex. Please check out [Reference - What does this regex mean resource](https://stackoverflow.com/questions/22937618), and [Learning Regular Expressions](https://stackoverflow.com/questions/4736) for more info on regex. – Christian Baumann Oct 08 '20 at 12:57

1 Answers1

1

Here is the pythonic way:

import re

c = '''[CLE2 - Single Smalls station 117,
HOU2 - mathdenn,
[SAT2] Pack Singles > Line 7 > Station 04 Kiosk Ticket - ZT410 Shipping Label Not Printing Correcly,
[HOU2] Multiple GW Stations Down in AFE2]'''

for substr in re.findall(r'\W([A-Z][A-Z][A-Z]\d)\W', c):
    c = c.replace(substr, '')
c = c.replace('[]', '')
print(c)
Aaj Kaal
  • 992
  • 1
  • 6
  • 8
  • Could this also be extended so that we can remove any string instances that have a character followed by a number? ZT410 needs to be removed as well. – Wolfy Oct 08 '20 at 13:43
  • 1
    for substr in re.findall(r'\W([A-Z][A-Z]\d{3})\W', c): c = c.replace(substr, '') – Aaj Kaal Oct 08 '20 at 13:51