Reading FITS Files

Introduction

Most of the numerical SDSS-III data is stored in the form of FITS files. These files can contain both images and binary data tables in a well-defined format. FITS files can be read and written with many programming languages, but the most common ones used by SDSS-III are IDL and Python.

IDL

The Goddard utilities contain tools for reading and writing FITS files. The most commonly used functions are mrdfits and mwrfits. The Goddard utilities are included in the idlutils package, which also contains additional programs for manipulating FITS files.

Python

The PyFITS package handles the reading and writing of FITS files in Python. Version 2.4.0 or later is strongly recommended, since this version now correctly reads and writes FITS checksum headers.

Another package is fitsio, developed by Erin Sheldon, which is a Python wrapper on the CFITSIO library. It allows direct access to the columns of a FITS binary table which can be useful for reading large fits files, as detailed below. This package is available for download here.

Large FITS Files

FITS files larger than about 2 GB can be more challenging to read. One such file is the spAll file. The simplest method for reading large FITS files is to download the fitsio Python module described above. The module can read only selected columns from the FITS file:

import fitsio
columns = ['PLATE', 'MJD', 'FIBERID', 'Z', 'ZWARNING', 'Z_ERR']
d = fitsio.read('spAll-v5_5_12.fits', columns=columns)

The PyFITS module has more stringent hardware requirements as it must read the whole file in order to use it. On a 64-bit machine with > 4 GB of memory, it is possible to use the memmap option:

import pyfits
fx = pyfits.open('spAll-v5_5_12.fits', memmap=True)
d = fx[1].data

In IDL, the routine hogg_mrdfits is available as part of the idlutils package. This routine is similar to fitsio, in that one can specify a subset of columns to read. It avoids memory overload by reading only a subset of the rows of a FITS file, extracting the columns, then moving on to the next subset of rows.