I need to extract the real issue number in my file name. There are 2 patterns:
- if there is no leading number in the file name, then the number, which we read first, is the issue number. For example
asdasd 213.pdf ---> 213
abcd123efg456.pdf ---> 123
- however, sometimes there is a leading number in the file name, which is just the index of file, so I have to ignore/skip it firstly. For example
123abcd 4567sdds.pdf ---> 4567, since 123 is ignored
890abcd 123efg456.pdf ---> 123, since 890 is ignored
I want to learn whether it is possilbe to write only one regular expression to implement it? Currently, my soluton involves 2 steps:
- if there is a leading number, remove it
- find the number in the remaining string
or in Python code
import re
reNumHeading = re.compile('^\d{1,}', re.IGNORECASE | re.VERBOSE) # to find leading number
reNum = re.compile('\d{1,}', re.IGNORECASE | re.VERBOSE) # to find number
lstTest = '''123abcd 4567sdds.pdf
asdasd 213.pdf
abcd 123efg456.pdf
890abcd 123efg456.pdf'''.split('\n')
for test in lstTest:
if reNumHeading.match(test):
span = reNumHeading.match(test).span()
stripTest = test[span[1]:]
else:
stripTest = test
result = reNum.findall(stripTest)
if result:
print(result[0])
thanks