I am just picking up and learning Python, For work i go through a lot of pdfs and so I found a PDFMINER tool that converts a directory to a text file. I then made the below code to tell me whether the pdf file is an approved claim or a denied claim. I dont understand how I can say find me the string that starts with "Tracking Identification Number..." AND is the 18 characters after that and stuff it into an array?
import os
import glob
import csv
def check(filename):
if 'DELIVERY NOTIFICATION' in open(filename).read():
isDenied = True
print ("This claim was Denied")
print (isDenied)
elif 'Dear Customer:' in open(filename).read():
isDenied = False
print("This claim was Approved")
print (isDenied)
else:
print("I don't know if this is approved or denied")
def iterate():
path = 'text/'
for infile in glob.glob(os.path.join(path, '*.txt')):
print ('current file is:' + infile)
filename = infile
check(filename)
iterate()
Any help would be appreciated. this is what the text file looks like
Shipper Number............................577140Pickup Date....................................06/27/17
Number of Parcels........................1Weight.............................................1 LBS
Shipper Invoice Number..............30057010Tracking Identification Number...1Z000000YW00000000
Merchandise..................................1 S NIKE EQUALS EVERYWHERE T BK B
WE HAVE BEEN UNABLE TO PROVIDE SATISFACTORY PROOF OF DELIVERY FOR THE ABOVE
SHIPMENT. WE APOLOGIZE FOR THE INCONVENIENCE THIS CAUSES.
NPT8AEQ:000A0000LDI 07
----------------Page (1) Break----------------
update: Many helpful answers, here is the route I took, and is working quite nicely if I do say so myself. this is gonna save tons of time!! Here is my the entire code for any future viewers.
import os
import glob
arrayDenied = []
def iterate():
path = 'text/'
for infile in glob.glob(os.path.join(path, '*.txt')):
print ('current file is:' + infile)
check(infile)
def check(filename):
with open(filename, 'rt') as file_contents:
myText = file_contents.read()
if 'DELIVERY NOTIFICATION' in myText:
start = myText.index("Tracking Identification Number...") + len("Tracking Identification Number...")
myNumber = myText[start : start+18]
print("Denied: " + myNumber)
arrayDenied.append(myNumber)
elif 'Dear Customer:' in open(filename).read():
print("This claim was Approved")
startTrackingNum = myText.index("Tracking Identification Number...") + len("Tracking Identification Number...")
myNumber = myText[startTrackingNum : startTrackingNum+18]
startClaimNumberIndex = myText.index("Claim Number ") + len("Claim Number ")
myClaimNumber = myText[startClaimNumberIndex : startClaimNumberIndex+11]
arrayApproved.append(myNumber + " - " + myClaimNumber)
else:
print("I don't know if this is approved or denied")
iterate()
with open('Approved.csv', "w") as output:
writer = csv.writer(output, lineterminator='\n')
for val in arrayApproved:
writer.writerow([val])
with open('Denied.csv', "w") as output:
writer = csv.writer(output, lineterminator='\n')
for val in arrayDenied:
writer.writerow([val])
print(arrayDenied)
print(arrayApproved)
Update: Added the rest of my finished code, Writes the lists to a CSV file where i go execute some =left()'s and such and boom I have 1000 tracking numbers in a matter of minutes. This is why programming is great.