2
import os

target_dir = "xxx.xxx.xx.xx/path/to/file/dir"
start_seq = "*** Start Sequence ***"
end_seq = "*** End Sequence ***"

def tp_parser(file):
    with open(file) as in_f:
        lines = in_f.readlines()
        f_name = in_f.name[12:16]

        for i, line in enumerate(lines):
            if line.startswith(start_seq):
                start_line = i
            elif line.startswith(end_seq):
                end_line = i

        with open("{0}_Target_Map.txt".format(f_name), "w") as out_f:
            for i, line in enumerate(lines):
                if start_line <= i < end_line:
                    print(line)
                    # out_f.write(line)

for file in os.listdir(os.chdir(target_dir)):
    tp_parser(file)

I wrote this script to look through a directory of files, extract a specific part of each file and write it out to a separate text file. Curious if someone can shine some light on what is happening here...

For this part:

if start_line <= i < end_line:
    print(line)
    # out_f.write(line)

If I run the script with print(line) I get back a "UnboundLocalError: local variable 'end_line' referenced before assignment" error. However, running the script with the out_f.write(line) works as intended.

Second slightly less annoying problem is this part:

for file in os.listdir(os.chdir(target_dir)):
    tp_parser(file)

I can't explain to myself why I have to switch to the working directory(i.e. os.chdir(target_dir)) to actually iterate through the files. I'm aware the os.listdir() by itself returns a list of file names but how is that any different if you pass a os.chdir() argument in os.listdir().

Thanks in advance.

blackmore5
  • 73
  • 1
  • 6

1 Answers1

2

Starting by the end, you don't have to switch to the working dir to iterate. I suppose you want to list the target dir right? So you either change the dir to current with chdir or you specify the dir to listdir. You are using the first one, but chdir changes dir as a lateral effect not as a result. chdir is returning None. So you call listdir with None, which by mere luck accepts None as an hint to use current dir. You can write

os.chdir(target_dir)
for file in os.listdir():
    tp_parser(file)

or

for file in os.listdir(target_dir):
    tp_parser(file)

As for the first problem, notice that you assign start_line and end_line with conditional statements. I suspect your problem is that sometimes one of the conditions is not met and using the unassigned name later in code brings up the error.

Edit:

There is also another problem with this code, you are writing your output file into the same dir that you are listing. Next time you run the code the code will parse also your last output file. I suppose this is not intended behaviour; but if you go on you'll see that your output files don't have the end marker, as your condition for output excludes it:

if start_line <= i < end_line: # use of < instead of <= end_line excludes end marker from output

So the error in that case is just malformed new input files appearing amongst the others. And this are erraneous coincidences (user error), not Python code unexpected behaviour.

progmatico
  • 3,735
  • 1
  • 11
  • 23
  • Thanks for the answer, much appreciated! Two follow up questions though: 1. Running: for file in os.listdir(target_dir): tp_parser(file) Gives me a file not found error. This, I presume came from trying to insert a filename string into open(file) which is what prompted me to use os.chdir() (either on top or within listdir()) as workaround. 2. I think you are right about the condition not being met but I'm curious why that wouldn't throw the same error when using out_f.write(line)? – blackmore5 Dec 05 '17 at 15:32
  • 1
    that is just coincidence as I suspected ;), see my answer edit. – progmatico Dec 05 '17 at 15:53