JISAO data

NetCDF operators (NCOs) for file manipulation


A set of utilities called netCDF operators (NCOs) are available on most of the linux machines, and they permit you to perform simple calculations and manipulations of netCDF or HDF4 files at the operating system (for example, linux) level with only a minimimal knowledge of the netcdf files.

Useful information on each command can be obtained by just typing the name of the command in your linux session, for example, "ncatted" (no quotes), followed by carriage return. The NCO User's Guide is available in several formats: HTML and PostScript, PDF, and other, and the users guide is very useful subject to the caveat that not all of the features included in the users guide have actually been implemented, and you are encouraged to check the calculations so that you can build confidence in what the operators are doing. There is also a Wiki with a cookbook of how to do specfic calculations with NCOs, and there is a users forum where you can ask questions.

Examples of commands:

  • Pick off a vertical level from NCEP / NCAR reanalysis data
  • Calculate a time mean
  • Calculate a thickness
  • Calculate a flux quantity
  • Change the variable type
  • Add or modify file attributes
  • Rename a variable
  • Concatenating files (appending in time)
  • Fixing the time variable, Part I: Adding a record dimension.
  • Fixing the time variable, Part II
  • Subsetting a region or time
  • Rearranging (flipping, reversing) latitudes in a file

    Examples of combinations of commands used to perform common calculations:

  • Calculate a monthly climatology

    Please contribute to this WWW page. This file is /home/disk/margaret2/jisao/data/nco/index.html, and you should have write permission.


    Examples of commands:

    1) The usefulness of these routines is best demonstrated with an example. We get NCEP / NCAR reanalysis data from NOAA CDC in files where values of the variable, for example, geopotential height, are given for all 17 of the model levels. For simplicity it would be nice to have a file of just 500 mb geopotential height. Creating such a file can be done with:
    ncea -d level,6,6 -F hgt.mon.mean.nc hgt500.mon.mean.nc
    where:

  • "ncea" stands for netcdf ensemble averager
  • "-d level,6,6 -F" says take the average along the "level" dimension, and average from level 6 to level 6. The vertical levels in the NCEP NCAR reanalysis are:
    1000, 925, 850, 700, 600, 500, 400, 300, 250, 200, 150, 100, 70, 50, 30, 20, 10
    and the 500mb level is the 6th level under FORTRAN-indexing (5th level under C-indexing). The "-F" specfication says to use FORTRAN-indexing.


    
    
    2) "ncra" could be used to calculate the January climatology of
    monthly mean data:
    ncra -F -d time,1,nmonths,12 hgt.mon.mean.nc hgt.mon.mean.clim.nc
    where it now takes the average of every twelfth month and "nmonths" is the total number of months in the file. More generally, the maps to average are specified by "-d dimension,minimum,maximum,stride".
    "-F" means use FORTRAN indexing (the numbering begins with 1).
    ncra -F d time,1,,1 input_file output_file will calculate the time mean of a file.


    
    3) Calculate thickness.
    ncea -F -d level,8,8 hgt.mon.mean.nc hgt300.mon.mean.nc
    ncea -F -d level,3,3 hgt.mon.mean.nc hgt850.mon.mean.nc
    The NCEP / NCAR reanalysis comes with data for 17 vertical levels. Pick off the data for the 300 and 850 mb pressure levels. -F means use FORTRAN-indexing (indexing begins with 1). C-indexing begins with 0.

    ncdiff hgt300.mon.mean.nc hgt850.mon.mean.nc thickness300850.mon.mean.nc
    Subtract hgt850 from hgt300, and write it out to thickness...


    
    4) A more complex calcultaion is to compute pentad-mean (5-day
    averages) horizontal momentum fluxes, the average of u'v', from the
    reanalysis daily-average files.  The time average momentum flux,
    [u'v'], can be written as:
    [u'v'] = [uv] - [u][v]
    where [] and ()' denote time averages and deviations from the time average respectively. In this case the time averages are the averages over 5 days.

    With the exception of ncrcat, there are always only one input and one output file in a NCO operation. One way to include two or more input variables in a calculation is to append additional variable files to a file. The reanalysis daily-average data is stored as one year per file. Append vwnd to the uwnd file:
    ncks -A vwnd.1948.nc uwnd.1948.nc

    Calculate the product uv for a year of daily-average data.
    ncap -s "uv=uwnd*vwnd" uwnd.1948.nc product.nc
    where product.nc will contain uv, uwnd, and vwnd.

    Beginning of pentad loop:
    Calculate the pentad means of uv.
    ncra -O -F -d time,day1,day2 product.nc product.pentad.nc
    where day1 and day2 are the first and last Julian days of the pentad, respectively.

    Calculate u'v'.
    ncap -s "upvp=uv-uwnd*vwnd" -v product.pentad.nc upvp.pentad.nc
    where -v forces only upvp (and not uv, uwnd, or vwnd) to be output to upvp.pentad.nc
    End of pentad loop

    Concatenate the pentad files in a single file for the year.
    ncrcat -h -O upvp.pentad.nc upvp.1948.nc

    Dan Vimont, now at the University of Wisconson, figured out this calculation. His cshell script for this calculation is linked here (Dan's WWW page).


    
    5) How to change the variable type:
    The NCEP / NCAR reanalysis comes as short integers, and NCO tends to write outputs as floating point numbers (which take up twice as much disk space). To convert the floating point numbers back to packed integers, first look at the packing in the original files from CDC, and use these as the add_offset and scale_factor for packing.

    For air temperature the add_offset and scale_factor are 512.81 and 0.01, respectively:

    ncap -O -s "air=(air-512.81)/0.01" filename.nc temp.nc
    ncap -O -s "air=short(air);air@add_offset=512.81;air@scale_factor=0.01" temp.nc filename.nc


    
    6) For files to be intelligently handled by the Live Access Software,
    a file needs to have the following defined: units, long_name, and
    title.  In addition, I like to put in an extended history variable.

    ncatted -O -a units,air,c,c,"units goes here" filename.nc
    where "-a" is followed by "attribute name, variable name, mode (append, create, delete, modify, overwrite), attribute variable type (float, character, ...), attribute value"
    ncatted -O -a long_name,air,c,c,"long_name goes here" filename.nc
    ncatted -O -h -a title,global,o,c,"title goes here" filename.nc
    ncatted -O -h -a history,global,o,c,'history goes here' filename.nc
    "\n" (no quotes) can be used to put in a carriage return.


    
    7) For files to be accepted by the Live Access Software,
    a file cannot use the word "data" as a variable name.

    ncrename -h -O -v old_variable_name,new_variable_name filename.nc
    -h: do not add to the history variable
    -O: (upper case) overwrite the file.
    will make the Live Access Server software happy.


    
    8) NCO differentiates between concatenating and appending.

    If you want to put two variables with the same time dimension into the same file, use
    ncap -h -A file_a file_b
    will put variables a and b into file file_b

    If you have yearly files that you want to concatenate, use
    ncrcat -h file_1979 file_1980 file_1981 file_197919801981
    ncrcat -h file_1979 file_198[01] file_197919801981
    should do the same thing.
    -h: do not add to the history variable.*

    * There is a special problem that can arise with the time dimension in the concatenated files. I write time as "so many units since some reference time" where the reference time is the first time period present in a file. ncrcat, at present, doesn't calculate the time correctly for files written with the above time prescription, and it provides a non-fatal error. I have come up with a matlab5 work-around where I write correct time values into the time variable inside a matlab session. For example,

    f = netcdf( filename, 'write' )
    f{'time'}(:) = correct_time_values;
    f{'time'}(penultimate:last) = (penultimate and last correct_time_values);
    % For reasons that make no sense to me I had to do the previous line.
    ncclose( filename )
    % You have to "ncclose" the file to write the changes.


    
    9) Fixing the time variable, Part I: Adding a record dimension to a file.

    The first question you are asking is "what is a record dimension?" It is becoming common that netCDF files are written for individual months as opposed to larger files with data for a span of months or years. In order to concatenate the files into a single, larger file (see above) with the nco utilites, you need to add a "record dimension" to each file. The triplet of nco commands you need to do this are given on the NCO documentation WWW page (here).

    The original file, "in.nc", does not have a record dimension. "out.nc", after the 3 operations, will be the same data, but with a record dimension defined.

    ncecat -O -h in.nc out.nc
    ncpdq -O -h -a time,record out.nc out.nc
    ncwa -O -h -a record out.nc out.nc


    
    9b) Fixing the time variable, Part
    II.

    Sometimes the file has a time variable that has a value that you want to use in the fixed file. Consider the header and time value of the following file. As with Part I, you need to make time a "record dimension" so that NCOs will concatenate files.

    netcdf filename {
    dimensions:
            time = 1 ;
            lat = 89 ;
            lon = 180 ;
    variables:
            float time(time) ;
                    time:units = "days since 1854-01-15" ;
            float data(time,lat,lon)
            ...
    data:
     time = 15 ;
    }
    
    You want to preserve the time value as you convert "time" to being a record dimension.
    
    set str1 = 'time = 1 ;'
    set str2 = 'time = UNLIMITED ; // (1 currently)'
    ncdump in.nc  | sed -e "s#^.$str1# $str2#" | ncgen -o out.nc
    
    
    This script dumps the netcdf file, swaps the time dimension of 1 for "unlimited" time currently 1, and generates a new netCDF file. Someone far more clever than me figured this out. Now you can use NCOs to concatenate files.


    
    10) Subsetting a region or time.

    Subsetting a region of an array is handled differently than subsetting time.

    i) Subsetting a region

    Say you have a global dataset, and you only want the data for the northeast portion of the Pacific Ocean.

    ncea -d lat,minimum_lat,maximum_lat -d lon,minimum_lon,maximum_lon in.nc out.nc

    where "lat" is what latitudes are called your file, and minimum_lat and maximum_lat are latitudes. It seems to work better if you include a decimal point in the minimum and maximum latitude specifications. An example is in the NCO documentation.

    ii) Subsetting time.

    It appears that subsetting in the time dimension requires you to specify the array position and not time itself.

    ncea -F -d time,first,last in.out out.nc

    where "first" and "last" are the array positions and not actual times. Also, if you are calling the first map "1", use the "-F" option. See the NCO documentation.


    
    11) Reversing (flipping, rearranging) the latitudes in a file.

    For most applications it doesn't matter if the data is arranged in the file from southernmost to northernmost latitudes or from northernmost to southernmost latitudes, but it does matter if you are calculating spatial derivatives of a field (the curl in particular). The following will rearrange the latitudes of the data in a file. See the examples in the NCO documentation for more information on what can be done.

    ncpdq -O -h -a -lat filename.nc filename.nc

    where "-a -lat" means arrange the latitudes by reversing them.

    
    

    June 2009
    Todd Mitchell ( mitchell@atmos.washington.edu )
    JISAO data