JISAO data

NetCDF operators (NCOs) for file manipulation and simple calculations


A set of command-line utilities called netCDF operators (NCOs) are available on most of the linux machines, and Mac and PC versions can be downloaded. NCOs permit you to perform simple calculations and manipulations of netCDF or HDF4 files with only a minimal knowledge of the netcdf files. NCOs can be an order of magnitude faster than processing data with matlab or other analysis packages. A set of utilities with similar functionality has been developed at the National Center for Atmospheric Research, and it is called NCAR Command Language. I haven't used it, but I see that it will also make plots of your data.

Useful information on each NCO command can be obtained by just typing the name of the command in your linux session, for example, "ncatted" (no quotes), followed by carriage return. The NCO User's Guide is available in several formats: HTML (User's Guide Index in HTML) and PostScript, PDF, and other, and the users guide is very useful. I strongly urge you to check your calculations a little with other software, for example MATLAB, to make sure that NCO is doing what you think that it is doing. This is especially true with observational data where missing data, and the inumerable ways of writing missing data metadata, can cause you to get incorrect results if you are not careful. Also, if something doesn't work as the manual says it should, you should be sure that you have the latest version of the software. There is also a help forum that you can submit questions to.

Examples of commands:

  • Pick off a vertical level from NCEP / NCAR reanalysis data
  • Calculate a time mean or sum over a dimension
  • Calculate a thickness
  • Calculate a flux quantity
  • Change the variable type
  • Add or modify file attributes
  • Rename a variable, dimension, or attribute
  • Concatenating files (appending in time)
  • Fixing the time variable, Part I: Adding a record dimension.
  • Fixing the time variable, Part II
  • Subsetting a region or time, and decimating
  • Rearranging (flipping, reversing) latitudes
  • Calculating area averages (producing timeseries)
  • Calculate the temporal standard deviation
  • Eliminate a dimension

    Examples of combinations of commands used to perform common calculations:

  • Calculate a monthly climatology
  • Calculate the area between pairs of contours

    Please contribute to this WWW page. This file is /home/disk/margaret2/jisao/data/nco/index.html, and you should have write permission.


    Examples of commands:

    1) The usefulness of these routines is best demonstrated with an example. We get NCEP / NCAR reanalysis data from NOAA CDC in files where values of the variable, for example, geopotential height, are given for all 17 of the model levels. For simplicity it would be nice to have a file of just 500 mb geopotential height. Creating such a file can be done with:
    ncea -d level,6,6 -F hgt.mon.mean.nc hgt500.mon.mean.nc
    where:

  • "ncea" stands for netcdf ensemble averager
  • "-d level,6,6 -F" says take the average along the "level" dimension, and average from level 6 to level 6. The vertical levels in the NCEP NCAR reanalysis are:
    1000, 925, 850, 700, 600, 500, 400, 300, 250, 200, 150, 100, 70, 50, 30, 20, 10
    and the 500mb level is the 6th level under FORTRAN-indexing (5th level under C-indexing). The "-F" specfication says to use FORTRAN-indexing.

    Return


    
    
    2) "ncra" could be used to calculate the January climatology of
    monthly mean data:
    ncra -F -d time,1,nmonths,12 hgt.mon.mean.nc hgt.mon.mean.clim.nc
    where it now takes the average of every twelfth month and "nmonths" is the total number of months in the file. More generally, the maps to average are specified by "-d dimension,minimum,maximum,stride".
    "-F" means use FORTRAN indexing (the numbering begins with 1).
    ncra -F -d time,1,,1 input_file output_file will calculate the time mean of a file.

    To sum over a dimension, for example to sum a file of daily precipitation to obtain an annual total
    ncra -h -O -y ttl in.nc out.nc
    will sum in time. I am not sure if this is because time is the record dimension in my file.

    Return


    
    3) Calculate thickness.
    ncea -F -d level,8,8 hgt.mon.mean.nc hgt300.mon.mean.nc
    ncea -F -d level,3,3 hgt.mon.mean.nc hgt850.mon.mean.nc
    The NCEP / NCAR reanalysis comes with data for 17 vertical levels. Pick off the data for the 300 and 850 mb pressure levels. -F means use FORTRAN-indexing (indexing begins with 1). C-indexing begins with 0.

    ncdiff hgt300.mon.mean.nc hgt850.mon.mean.nc thickness300850.mon.mean.nc
    Subtract hgt850 from hgt300, and write it out to thickness...

    Return


    
    4) A more complex calcultaion is to compute pentad-mean (5-day
    averages) horizontal momentum fluxes, the average of u'v', from the
    reanalysis daily-average files.  The time average momentum flux,
    [u'v'], can be written as:
    [u'v'] = [uv] - [u][v]
    where [] and ()' denote time averages and deviations from the time average respectively. In this case the time averages are the averages over 5 days.

    With the exception of ncrcat, there are always only one input and one output file in a NCO operation. One way to include two or more input variables in a calculation is to append additional variable files to a file. The reanalysis daily-average data is stored as one year per file. Append vwnd to the uwnd file:
    ncks -A vwnd.1948.nc uwnd.1948.nc

    Calculate the product uv for a year of daily-average data.
    ncap -s "uv=uwnd*vwnd" uwnd.1948.nc product.nc
    where product.nc will contain uv, uwnd, and vwnd.

    [ Brian Mapes provided another way to obtain the product in a file.
    ### rename variable vwnd to uwnd so ncbo will know to multiply them
    ncrename -v vwnd,uwnd vwnd.1948.nc v.1948.nc

    ### now multiply
    ncbo --op_typ=multiply uwnd.1948.nc v.1948.nc uvwnd.1948.nc

    ### rename the product back to uv, and put in file of same name uv
    ncrename -v uwnd,uv uvwnd.1948.nc uv.1948.nc

    ### mop up
    rm uvwnd.* v.*
    ]

    Beginning of pentad loop:
    Calculate the pentad means of uv.
    ncra -O -F -d time,day1,day2 product.nc product.pentad.nc
    where day1 and day2 are the first and last Julian days of the pentad, respectively.

    Calculate u'v'.
    ncap -s "upvp=uv-uwnd*vwnd" -v product.pentad.nc upvp.pentad.nc
    where -v forces only upvp (and not uv, uwnd, or vwnd) to be output to upvp.pentad.nc
    End of pentad loop

    Concatenate the pentad files in a single file for the year.
    ncrcat -h -O upvp.pentad.nc upvp.1948.nc

    Dan Vimont, now at the University of Wisconson, figured out this calculation. His cshell script for calculating pentad means is linked here (Dan's WWW page).

    Return


    
    5) How to change the variable type:
    The NCEP / NCAR reanalysis comes as short integers, and NCO tends to write outputs as floating point numbers (which take up twice as much disk space). To convert the floating point numbers back to packed integers, first look at the packing in the original files from CDC, and use these as the add_offset and scale_factor for packing.

    For air temperature the add_offset and scale_factor are 512.81 and 0.01, respectively:

    ncap -O -s "air=(air-512.81)/0.01" filename.nc temp.nc
    ncap -O -s "air=short(air);air@add_offset=512.81;air@scale_factor=0.01" temp.nc filename.nc

    Return


    
    6) For files to be intelligently handled by the Live Access Software,
    a file needs to have the following defined: units, long_name, and
    title.  In addition, I like to put in an extended history variable.

    ncatted -O -a units,air,c,c,"units goes here" filename.nc
    where "-a" is followed by "attribute name, variable name, mode (append, create, delete, modify, overwrite), attribute variable type (float, character, ...), attribute value"
    ncatted -O -a long_name,air,c,c,"long_name goes here" filename.nc
    ncatted -O -h -a title,global,o,c,"title goes here" filename.nc
    ncatted -O -h -a history,global,o,c,'history goes here' filename.nc
    "\n" (no quotes) can be used to put in a carriage return.

    Return


    
    7) This is handy when you obtain a file with unfortunate choices of
    variable, dimension, or attribute names.

    ncrename -h -O -v old_variable_name,new_variable_name filename.nc
    -h: do not add to the history variable
    -O: (upper case) overwrite the file.
    -d oldname,newname: to change a dimension name
    -a oldname,newname: to change an attribute name

    Return


    
    8) NCO differentiates between concatenating and appending.

    If you want to put two variables with the same time dimension into the same file, use
    ncks -h -A file_a file_b
    will put variables a and b into file file_b

    If you have yearly files that you want to concatenate, use
    ncrcat -h file_1979 file_1980 file_1981 file_197919801981
    ncrcat -h file_1979 file_198[01] file_197919801981
    should do the same thing.
    -h: do not add to the history variable.*

    * There is a special problem that can arise with the time dimension in the concatenated files. I write time as "so many units since some reference time" where the reference time is the first time period present in a file. ncrcat, at present, doesn't calculate the time correctly for files written with the above time prescription, and it provides a non-fatal error. I have come up with a matlab5 work-around where I write correct time values into the time variable inside a matlab session. For example,

    f = netcdf( filename, 'write' )
    f{'time'}(:) = correct_time_values;
    f{'time'}(penultimate:last) = (penultimate and last correct_time_values);
    % For reasons that make no sense to me I had to do the previous line.
    ncclose( filename )
    % You have to "ncclose" the file to write the changes.

    Return


    
    9) Fixing the time variable, Part I: Adding a record dimension to a file.

    The first question you are asking is "what is a record dimension?" It is becoming common that netCDF files are written for individual months as opposed to larger files with data for a span of months or years. In order to concatenate the files into a single, larger file (see above) with the nco utilities, you need to add a "record dimension" to each file. The triplet of nco commands you need to do this are given on the NCO documentation WWW page (here).

    The original file, "in.nc", does not have a record dimension.

    ncks --mk_rec_dmn time in.nc out.nc
    makes the dimension "time" the record dimension.

    The old way to do this (pre 4.0.1) was:

    ncecat -O -h in.nc out.nc
    ncpdq -O -h -a time,record out.nc out.nc
    ncwa -O -h -a record out.nc out.nc

    Return


    
    9b) Fixing the time variable, Part
    II.

    Sometimes the file has a time variable that has a value that you want to use in the fixed file. Consider the header and time value of the following file. As with Part I, you need to make time a "record dimension" so that NCOs will concatenate files.

    netcdf filename {
    dimensions:
            time = 1 ;
            lat = 89 ;
            lon = 180 ;
    variables:
            float time(time) ;
                    time:units = "days since 1854-01-15" ;
            float data(time,lat,lon)
            ...
    data:
     time = 15 ;
    }
    
    You want to preserve the time value as you convert "time" to being a record dimension.
    
    set str1 = 'time = 1 ;'
    set str2 = 'time = UNLIMITED ; // (1 currently)'
    ncdump in.nc  | sed -e "s#^.$str1# $str2#" | ncgen -o out.nc
    
    
    This script dumps the netcdf file, swaps the time dimension of 1 for "unlimited" time currently 1, and generates a new netCDF file. Someone far more clever than me figured this out. Now you can use NCOs to concatenate files.

    Return


    
    10) Subsetting a region or time, and decimating.

    Subsetting a region of an array is handled differently than subsetting time.

    i) Subsetting a region

    Say you have a global dataset, and you only want the data for the northeast portion of the Pacific Ocean.

    ncea -d lat,minimum_lat,maximum_lat -d lon,minimum_lon,maximum_lon in.nc out.nc

    where "lat" is what latitudes are called your file, and minimum_lat and maximum_lat are latitudes. Integer latitudes/longitudes are treated as indices, and floating point latitudes/longitudes are actual latitudes/longitudes. An example is in the NCO documentation.

    If you are using a dateset with wrapped coordinates (sometimes called cyclical boundary conditions), for example the longitudes in a global dataset, and you want to subset across the step jump in the coordinate, ncks will perform the subsetting. An example is where the dataset longitudes span 0 to 360 degrees, and you want to subset a region that includes the Greenwich Meridian.

    ncks -d lon,minimum_lon,maximum_lon in.nc out.nc

    An example of the above would be a subset for the Sahel in a dataset where longitudes span 0 to 360:
    ncks -d lon,340.0,10.0 -d lat,10.0,20.0 in.nc out.nc

    ii) Subsetting time.

    Subsetting in time can be performed by specifying the actual time written as floating point numbers or the time index values written as integers. Type "ncdump -v time filename.nc" (no quotes) to see what the time values are. If specifying time index values, you need to include the "-F" option if you are counting the first time value as "1". See the NCO documentation. [ Andres Roubicek pointed out the first method. ]

    ncea -F -d time,first,last in.out out.nc

    iii) Decimating in time or space.

    One of the definitions of the word "decimate" is to remove every tenth member of a set, and in data processing the term has come to mean retaining only the nth temporal or spatial gridploint of a dataset. An example of this is if the input file has 3-hourly data and you want to pick off the 00Z observations. This is accomplished with:

    ncks -F -d time,1,,8 input.nc output.nc

    where "-F" specificies that the counting begins with 1 as opposed to 0, and every 8th time record is kept, beginning with the first record.

    
    
    

    Return


    
    11) Reversing (flipping, rearranging) the latitudes in a file.

    For most applications it doesn't matter if the data is arranged in the file from southernmost to northernmost latitudes or from northernmost to southernmost latitudes, but it does matter if you are calculating spatial derivatives of a field (the curl in particular). The following will rearrange the latitudes of the data in a file. See the examples in the NCO documentation for more information on what can be done.

    ncpdq -O -h -a -lat filename.nc filename.nc

    where "-a -lat" means arrange the latitudes by reversing them.

    Return


    
    12) Calculating area-averages (producing timeseries)

    In the following, "ncwa" is employed to take spatial averages of data. NCOs recognize "_FillValue" but not "missing_value" as the attribute name for missing values. You can add a _FillValue attribute with "ncatted -O -h -a _FillValue,variablename,o,attribute_type,value in.nc out.nc" or rename all the missing_value attributes in a file to _FillValue with "ncrename -a .missing_value,_FillValue in.nc out.nc" (relevant NCO documentation). It is crucial that the attribute type be consistent with the attribute value. The NCO documentation recommends that you have define both _FillValue and missing_value, which seems like a smart idea. All in all I think that it is wise to check your calculation with matlab while you are building confidence in what NCOs are doing and how it is interpreting the file metadata.

    Calculate your own "nino3.4" SST index!
    ncwa -O -a lat,lon -d lat,-5.0,5.0 -d lon,190.0,240.0 in.nc nino34.nc

  • Do a ncdump of the the longitude variable to see how longitudes are specified: -180 to 180, or 0 to 360, or something else.
  • You need to specify the latitudes and longitudes with decimal points or else NCO will interpret the inputs as matrix indices.

    Global datasets tend to have longitude ranges of 0 to 360 or -180 to 180. In these respective organizations, there is a way to calculate a mean for a region that includes the Greenwich Meridian or the Dateline. For the case of longitudes spaning 0 to 360, a mean which includes the Greenwich Meridian is calculated with the following.

    ncks -d lon,lon_minimum,lat_minimum in.nc out.nc
    ncwa -O -a lat,lon out.nc out.nc

    where lon_minimum>lon_maximum. An example of this calculation would be an average for the Sahel:

    ncks -d lon,340.0,10.0 -d lat,10.0,20.0 in.nc out.nc
    ncwa -O -a lat,lon out.nc out.nc

    I haven't tried the case of longitudes organized from -180 to 180. One should look at the output of the ncks operation to be sure that you are getting the longitudes that you want.

    To calculate an area-averaged index, you first need to add the area weights to the file.
    ncap -h -O -s "weights=cos(lat*3.1415/180)" in.nc in.nc
    ncwa -h -O -w weights -a lat,lon in.nc global_mean.nc

    You can also mask some of the grid:
    ncwa -h -O -B logical_expression -w weights -a lat,lon in.nc out.nc
    The "-B" option stands for binary and an example of the logical_expression could be 'lat > 20' (with single quotes and spaces: not 'lat>20') to calculate the mean for 20-90N. There are other possibilities in the ncwa examples part of the users guide.

    Another way to do this is in two steps:
    ncks -C -v maskvariablename -A mask.nc in.nc % Adds the mask to out.nc
    ncwa -O -h -w maskvariablename -a lat,lon in.nc out.nc

    
    
    
    Return
    
    13) Calculate the temporal standard deviation

    See the NCO manual on this but, in short, the commands are:

    Calculate the time mean of variable_name.
    ncwa -O -v variable_name -a time in.nc out.nc

    Calculate the deviations with respect to the mean.
    ncbo -O -v variable_name in.nc out.nc out.nc
    Sum the square of the deviations, divide by (N-1), and take the square root
    ncra -O -y rmssdn out.nc out.nc

    
    
    Return
    
    14) Eliminate a dimension

    This may seem like an odd thing to need to do until you need to do it. I wrote a landmask file that unfortunately included time as a dimension landmask(time,lat,lon). This caused confusion when I tried to calculate averages of a variable only over land. To eliminate time, I did the following:

    ncwa -a variable_to_eliminate in.nc out.nc

    Thank you to Henry Butowski!

    
    
    Return
    December 2014
    Todd Mitchell ( mitchell@atmos.washington.edu )
    JISAO data