0

I've been working on this file rename program for a few days now. I've learned a lot thanks to all of the "silly" questions those before me have asked on this site and the quality answers they have received. Well, on to my problem.

My filenames are in the following format: ACP001.jpg, ACP002.jpg,... ACP010.jpg, ACP011.jpg, ACP012_x.jpg, ACP013.jpg, ACP014_x.jpg

pattern = r'(ACP0)(0*)(\d+)(\.jpg)'
replace = r'\3\4'

So that was working fine for most of them... but then there were some that had the "_x" just before the file extension. I ammended the pattern and replacement pattern as follows

pattern = r'(ACP0)(0*)(\d+)(_w)*(\.jpg)'
replace = r'\3.jpg'

I think I cheated by hardcoding the ".jpg" in the replace string. How would I handle these situations where the match object groups may be of varying sizes? I essentially want the last group and the third group in this example.

jps
  • 11,454
  • 12
  • 42
  • 55
RodNICE
  • 35
  • 1
  • 6
  • Do you have `w` or `\w` in your 2nd pattern? Also, if you have that additional group defined and you need to keep it in the result, try adding it to the replacement pattern, [`r'\3\4\5'`](https://regex101.com/r/xqqmHY/2). – Wiktor Stribiżew Sep 10 '18 at 09:40
  • 2
    RodNICE, could you please explain what the final output should look like? Also, why use 4 groups in the pattern if you only keep the last 2? What can `x` be? – Wiktor Stribiżew Sep 10 '18 at 09:59
  • I do apologize for my novice use of the regex. I'm going through Al Sweigart's book on Python and I'm on chapter 9 trying to give myself a projecy that will help me understand it all. I need to reread chapter 8 and 9 because I couldn't understand why I was grouping everything with parantheses. Expected Output: 1.jpg, 2.jpg,... 10.jpg, 11.jpg, 12.jpg, 13.jpg, 14.jpg – RodNICE Sep 10 '18 at 13:36

2 Answers2

1

Make the _x term optional:

pattern = r'(ACP0)(0*)(\d+)(_x)?(\.jpg)'

I don't actually know why you have so many capture groups in your pattern. I would have written it this way:

pattern = r'ACP(\d{3})(_x)?\.jpg'
Tim Biegeleisen
  • 387,723
  • 20
  • 200
  • 263
0

You can use . to match any character except newline. Considering OP wants to rename all files to numbers only (ACP001.jpg -> 1.jpg), you can use following pattern and replace strings for that-

li=['ACP001.txt', 'ACP012.txt', 'ACP013_x.jpg'] # list of filenames
import re # built-in package for regular expressions
pattern = r'(ACP)(0*)(\d+)(.*)(\.\w+)'
replace = r'\3\5'
res = [re.sub(pattern, replace, st) for st in li]
print(res)

OUTPUT

['1.txt', '12.txt', '13.jpg']

This code works on all file extensions and removes the problem of multiple groups altogether.

njras
  • 511
  • 3
  • 12
  • 2
    Thanks for contributing and welcome to stackoverflow! Please describe what you changed and why you did it, so that the question's author and future visitors can learn from your answer. – slartidan Sep 10 '18 at 11:57