I want to pass a regex to pdfgrep using Python's subprocess module. The code executes without error, but pdfgrep is not receiving the argument properly. A test pdf is in the cwd and contains the string 'Mary Jane'. Here's my code (Python 3.6):
import subprocess
filtered = ['[A-Z].+Jane'] # the list of regexes is shortened to one string, to keep the example simple.
for regex in filtered:
arg = 'pdfgrep -PrH ' + f"{regex}"
process_match = subprocess.run(arg, stdout=subprocess.PIPE, shell=True)
The expected result is that process_match
would contain a CompletedProcess()
object containing the match.
But instead, it returns the following:
CompletedProcess(args="pdfgrep -PrH '[A-Z].+Jane'", returncode=127, stdout=b'')
At the command line, invoking the same pdfgrep
command finds the matching pdf. And I can do the task fairly trivially in Ruby with code like the following:
process_match = %x[pdfgrep -PrH "#{regex}"]
I'm new to python. What am I getting wrong when trying to pass the regex to the external command?