0

I have a 60GB FITS file containing a binary table. I would like to read (and process) this table one row/entry/line/block* at a time.

(*I'm unsure of the correct nomenclature)

I am using pyfits and what I would like to do boils down to simply:

import pyfits

hdulist = = pyfits.open("file.fits")
# the binary table has to be in the 2nd extension
# hence it is in hdulist[1]

n_entries = hdulist[1].header['NAXIS2']

for i in xrange(n_entries):
    entry = hdulist[1].data[i]  # I am confused what happens at this step

    # now do stuff with the values in entry
    # .....

The variable entry is of type <class 'pyfits.fitsrec.FITS_record'> and has a length equal to the number of columns in the binary table. However what appears to happen is the whole of the binary table is read into memory at this line: entry = hdulist[1].data[i].

I have looked through the pyfits documentation but I can't find any methods that seem to read data from a binary table extension on a table entry by table entry basis (or small sets of entries at a time). I don't want to select certain entries from the table, just simply scan through them in order.

I guess my questions are:

0) What is happening at the hdulist[1].data[i] step? Why is everything being read into memory? (is there some way around this?)

1) Have I missed something and can pyfits actually do what I want?

2) Is there another python library out there that will? (ie using a binary table in a FITS extension)

3) If not, can I re-write the data in a different binary (or other compressed/not ascii) format (that is not FITS) and find some other python library or module to do what I want?

alexabate
  • 143
  • 2
  • 10

1 Answers1

0

pyfits currently lacks a row iterator for tables. If the data columns are such that they require no conversion from the on disk storage format to their "physical" values then reading tables is fast. But otherwise it currently blows up if you try to read such columns. I wouldn't fight it too much as the table interface is being rewritten, but in the meantime you might want to try the fitsio library which is a Python wrapper around CFITSIO and provides efficient row-based iteration of tables.

Iguananaut
  • 15,675
  • 4
  • 43
  • 50
  • 1
    Thanks! I downloaded and installed [fitsio](https://pypi.python.org/pypi/fitsio/0.9.3) and this does do what I wanted. The code above becomes: `import fitsio` `filename = "file.fits"` `h = fitsio.read_header(filename, ext=1)` `n_entries = h["NAXIS2"]` `fits = fitsio.FITS(filename, iter_row_buffer=1000)` `for i in xrange(n_entries):` `entry = fits[1][i]` – alexabate Apr 09 '14 at 20:56