Arrays returning as empty when using an 'if statement' after glob.glob on FITS Files

Question

I am using glob.glob make my script only read the data from certain FITS files (astropy.io.fits is brought in as pf and numpy as np). For this x is the value that I change to get these certain files (For reference the 'x = np.arrange(0) and y1 = np.arange(0) simply creates an empty array that I then fill with data later.

def Graph(Pass):
    x = np.arange(0)
    y1 = np.arange(0)

    pathfile = '*_v0' + str(Pass) + '_stis_f25srf2_proj.fits'

        for name in glob.glob(pathfile):
             imn = 'FilePath' + str(name)

However, I wanted to add another filter to the files that I use. In each FITS file's header there is a quality I will call a. a is a non-integer numerical value that each file has. I only want to read files that have a within a specific range. I then take the data I need from the FITS file and add it to an array (for this is is 'power' p1 being added to y1 and 'time' t being added to x).

            imh = pf.getheader(imn)
            a = imh['a']

            if (192 <= a <= 206) is False:
                pass

            if (192 <= a <= 206) is True:
                im = pf.getdata(imn, origin='lower')
                subim1 = im[340:390, 75:120]
                p1 = np.mean(subim1)

                t = SubfucntionToGetTime

                y1 = np.append(y1, p1)
                x = np.append(x, t)

However when I run this function it returns with arrays with no values. I believe it is something to do with my code not working properly when it encounters a file without the appropriate a value, but I can't know how to fix this.

For additional reference I have tested this for a smaller subgroup of FITS files that I know have the correct a values and it works fine, that is why I suspect it is experiencing a values that messes-up the code as the first few files don't have the correct a values.

OK, I am officially adding this to the finished pile. Thank you Iguananaut for the improved flow for my code and everyone else who answered. Turns out the reason I kept on getting arrays of zero was there was a connection issue to the server in that when I tried to pull a lot of data it just told my computer there wasn't any data. That is now sorted. — Eoin Mansfield, Apr 14 '17 at 16:19
I'm glad you figured it out. TBH you should probably delete this question since it's not likely to be helpful to future readers (which StackOverflow questions should by typically). Happy to help though! — Iguananaut, Apr 14 '17 at 16:22

score 0 · Answer 1 · answered Apr 12 '17 at 17:20

0

A guess, but is a a string not an integer?

>>> 192 <= "200" <= 206
False
>>> 192 <= int("200") <= 206
True

answered Apr 12 '17 at 17:20

nigel222

4,092
1
8
15

Sorry, I'll make it clearer in my quesiton. *a* is a non-integer numerical value. – Eoin Mansfield Apr 12 '17 at 17:22
In that case I'd insert some debugging print statements to see what `a` values you are *actually* getting rather than what you *think* you are getting. Maybe the wrong files as input? A strong wrong preconception is a barrier to progress. – nigel222 Apr 12 '17 at 17:31
I know for a fact that the *a* values start at around 186 and rise to around 214. This is because the *a* value is the angle between two points and I know how that changes over time. – Eoin Mansfield Apr 12 '17 at 17:37

score 0 · Answer 2 · answered Apr 12 '17 at 18:08

First, ditch the np.append. Use list append instead

x = []
y1 = []
....
            y1.append(p1)
            x.append(t)

np.arange(0) does create a 0 element array. But can't fill it. At best it serves to jumpstart the np.append step, which creates a new array with new values. arr=np.empty((n,), float) makes a n element array that can be filled with arr[i]=new_value statements.

This will be faster, and should give better information on what is being added. If the x and y1 remain [], then yes, your filtering is skipping this part of the code. I'd also throw in some print statements to be sure. For example replace the pass with a print so you actually see what cases are being rejected.

Without your pf file, or what ever it is, we can't reproduce your problem. We can only suggest ways to find out more about what is going on.

score 0 · Accepted Answer · answered Apr 14 '17 at 09:43

There's a lot going on here, and the code you posted isn't even valid (has indentation errors). I don't think there's a useful question here for Stack Overflow because you're misusing a number of things without realizing it. That said, I want to be helpful so I'm posting an answer instead of just a comment because I format code better in an answer.

First of all, I don't know what you want here:

pathfile = '*_v0' + str(x) + '.fits'

Because before this you have

x = np.arange(0)

So as you can check, str(x) is just a constant--the string '[]'. So you're saying you want a wildcard pattern that looks like '*_v0[].fits' which I doubt is what you want, but even if it is you should just write that explicitly without the str(x) indirection.

Then in your loop over the glob.glob results you do:

imn = 'FilePath' + str(name)

name should already be a string so no need to str(name). I don't know why you're prepending 'FilePath' because glob.glob returns filenames that match your wildcard pattern. Why would you prepend something to the filename, then?

Next you test (192 <= a <= 206) twice. You only need to check this once, and don't use is True and is False. The result of a comparison is already a boolean so you don't need to make this extra comparison.

Finally, there's not much advantage to using Numpy arrays here unless you're looping over thousands of FITS files. But using np.append to grow arrays is very slow since in each loop you make a new copy of the array. For most cases you could use Python lists and then--if desired--convert the list to a Numpy array. If you had to use a Numpy array to start with, you would pre-allocate an empty array of some size using np.zeros(). You might guess a size to start it at and then grow it only if needed. Since you're looping over a list of files you could use the number of files you're looping over, for example.

Here's a rewrite of what I think you're trying to do in more idiomatic Python:


def graph(n_pass):
    x = []
    y1 = []

    for filename in glob.glob('*_v0.fits'):
        header = pf.getheader(filename)
        a = header['a']
        if not (192 <= a <= 206):
            # We don't do any further processing for this file
            # for 'a' outside this range
            continue

        im = pf.getdata(filename, origin='lower')
        subim1 = im[340:390, 75:120]
        p1 = np.mean(subim1)
        t = get_time(...)
        y1.append(p1)
        x.append(t)

You might also consider clearer variable names, etc. I'm sure this isn't exactly what you want to do but maybe this will help give you a little better structure to play with.

I do apologise, the *str(x)* wasn't meant to be using the array *x*. I was, ironically, trying to change how my code read to be more legible for my question. I see now that I really did a number on that. I'll take your suggestions and then probably fix all those mistakes I made in my question. — Eoin Mansfield, Apr 14 '17 at 10:30
Ok, yep I see what I did. When I put *x* in *str(x)* I meant to have put *Pass*, as in the variable that I am inputting into my function. — Eoin Mansfield, Apr 14 '17 at 10:44
OK, I thought of another problem. I didn't specify that the variable *Pass* that I am changing. Also I realised that I have not been clear in how the file names are set out. There are hundreds of file names, and the *Pass* variable is in the middle of the file. After *Pass* the file name is constant (i.e. is always ends is '_stis_f25srf2_proj.fits'), but before *Pass* the file name changes from file to file as this section contains the time the file was created (e.g. 'jup_16-ddd-hh-mm-ss_0030' with *ddd* being the day of the year, *hh* the hour, *mm* the minute, and *ss* the second. — Eoin Mansfield, Apr 14 '17 at 10:59
And *Pass* being a variable between 1-9, so a file name would read like this: 'jup_16-137-23-43-30_0030_v01_stis_f25srf2_proj.fits' With the 1 in the v01 being the *Pass* variable and the thing I want to search for using glob. — Eoin Mansfield, Apr 14 '17 at 11:08
FWIW `Pass` isn't a very good variable name in Python. For one, it's too similar to the keyword `pass` (which is probably why you made it uppercase). But Uppercase variable names is also not very idiomatic (see: PEP-8), though this is of course a subjective question. — Iguananaut, Apr 14 '17 at 16:17
Also instead of concatenation and `str()` it's typically more readable and more efficient to use string formtting such as `'blah_%s_blah' % variable`. — Iguananaut, Apr 14 '17 at 16:18
It's still not clear why you'd be prepending `'FilePath'` to each filename returned by `glob()`. — Iguananaut, Apr 14 '17 at 16:20
The 'FilePath' was an artifact from previous iterations of the code, It's been removed. — Eoin Mansfield, Apr 14 '17 at 17:29

Arrays returning as empty when using an 'if statement' after glob.glob on FITS Files

3 Answers3