Accelerate NetCDF-4 read access

For simulating the (global) terrestrial biosphere I need time series of climate data, which is mostly provided in a 3-dimesional NetCDF file with the dimensions longitude, latitude and time. If everything is in a single file, this is often several GB large and extracting a timeseries along one position can take quiet some time, since by default NetCDF files are optimized to extract time slices. In the following I show how I speed up reading from a large NetCDF file along the time axis.

The vegetation model I work with (LPJ-GUESS) needs daily or monthly climate data (precipitation, temperature, radiation) of the locations to simulate. Locations can be one or more independent points or a regular gridlist up to global coverage. Normally the climate drivers are global gridded files of 0.5×0.5°, which means, by excluding the arctic regions 720×278 points per time step. So far my first step was to get rid of all water points, by converting to a 2-dimensional reduced grid, which reduces the filesize by almost half. However, I just realized that this step might not be necessary anymore with the “shuffle” option, which compresses points of the same value (i.e. oceans/inland water), which results in the same file size as the reduced grid. Nevertheless, to extract a time series at one locations the file still needs to be read from the beginning to the end. Until recently I used the following script by making use of nco commands to reorganize NetCDF-3 files. It first removes the unlimited time dimension and then permutes the dimensions, so that the data is ordered along the time axis.

for f in *.nc; do
  echo ${f}
  ncecat -O ${input_dir}/${f} tmp/${f}
  # !!! next one needs at least 2*filesize RAM !!!
  ncwa -a record tmp/${f} tmp/${f}.tmp
  ncpdq -F -O -a ${other_dim},${time_dim} tmp/${f}.tmp ${output_dir}/${f}
  rm -f tmp/*

On the Unidata Developer’s Blog  I found the following two entries, which demonstrate the NetCDF-4 data structure and how to reorganize the NetCDF-4 internal data layout:

The chunking can be done with nccopy.  For testing I choose a large file with daily temperature of a climate scenario timeseries (1950-2099) from the ISI-MIP Fasttrack project consisting of 59199 land points and 54787 timesteps with a total size of 13 GB. As you can see the data is organized along the land_id dimensions:

ncdump -hs ../
netcdf rcp6p0_ipsl-cm5a-lr_tas {
        land_id = 59199 ;
        time = UNLIMITED ; // (54787 currently)
       float tas(time, land_id) ;
                tas:_FillValue = -999.9f ;
                tas:standard_name = "air_temperature" ;
                tas:long_name = "Near surface air temperature at 2m" ;
                tas:units = "K" ;
                tas:coordinates = "lat lon" ;
                tas:_Storage = "chunked" ;
                tas:_ChunkSizes = 1, 59199 ;
                tas:_Endianness = "little" ;

The reordering of the data is a very heavy I/O job, so I use the option ‘-w’ to keep the output in memory during operation and ‘-s -d 1’ for a bit of compression (filesize 7GB). For testing I created 5 different files with different chunk layouts, optimized for the 4MB cache with equal number of chunks in each dimension, a smaller subset of that and three files towards optimization for reading along the time axis. I made the experience, that it is faster to first create small chunks and then the large 4MB ones from the previous one. That was especially the case for the full permutation to optimize along the time axis when starting from the original file. It did’t finish within a day when done in one step.

time nccopy -sw -d 1 -c"land_id/210,time/195" ../
real: 9m2.892s user: 5m18.012s sys: 0m20.508s (62.35%)

time nccopy -sw -d 1 -c"land_id/1021,time/945"
real: 7m25.134s user: 5m45.352s sys: 0m11.676s (80.20%)

time nccopy -sw -d 1 -c"land_id/180,time/2192" ../
real: 7m44.932s user: 5m44.420s sys: 0m12.300s (76.72%)

time nccopy -sw -d 1 -c"land_id/1,time/54787"
real: 86m25.114s user: 84m27.848s sys: 0m10.976s (97.95%)

time nccopy -sw -d 1 -c"land_id/18,time/54787"
real: 90m0.941s user: 87m35.824s sys: 0m14.420s (97.58%)
ncdump -h
	land_id = 59199 ;
	time = UNLIMITED ; // (54787 currently)
	float tas(time, land_id) ;
		tas:_ChunkSizes = 945, 1021 ;

Out of curiosity I also included one unreduced file (compression level 1; shuffled; size 11GB, still larger compared to the reduced grid) with the following layout:

ncdump -hs ../
netcdf rcp6p0_ipsl-cm5a-lr_tas_gridded {
  lon = 720 ;
	lat = 360 ;
	time = UNLIMITED ; // (54787 currently)
	float tasAdjust(time, lat, lon) ;
		tasAdjust:_ChunkSizes = 38, 117, 234 ;
		tasAdjust:_DeflateLevel = 1 ;
		tasAdjust:_Shuffle = "true" ;

And here is a simple R script, which reads different shapes of the data and returns the time consumption (don’t forget to flush the cached file as described at the end of Chunking Data: Why it Matters, which is done from within the R-script here):

files <-  c("../",
            "", "",
            "", "",
res <- matrix(0.0, length(files), 3)
colnames(res) <- c("along.time", "along.land_id"; ,"slice")
rownames(res) <- files
for (f in files) {
  ncin <-
  system("/bin/sync && /bin/echo 3 > /proc/sys/vm/drop_caches")
  res[f,'along.time'] = system.time(dummy <-, "tas", start=c(1, 1), count=c(1, NA)))[3]
  system("/bin/sync && /bin/echo 3 > /proc/sys/vm/drop_caches")
  res[f,'along.land_id'] = system.time(dummy <-, "tas", start=c(1, 1), count=c(NA, 1)))[3]
  system("/bin/sync && /bin/echo 3 > /proc/sys/vm/drop_caches")
  res[f,'slice'] = system.time(dummy <-, "tas", start=c(20000, 20000), count=c(1200, 1000)))[3]
files <- append(files, f)
res = rbind(res, 0)
rownames(res) <- files
ncin <-
system("/bin/sync && /bin/echo 3 > /proc/sys/vm/drop_caches")
res[f,'along.time'] = system.time(dummy <-, "tasAdjust", start=c(1, 1, 1), count=c(1, 1, NA)))[3]
system("/bin/sync && /bin/echo 3 > /proc/sys/vm/drop_caches")
res[f,'along.land_id'] = system.time(dummy <-, "tasAdjust", start=c(1, 1, 1), count=c(NA, NA, 1)))[3]
system("/bin/sync && /bin/echo 3 > /proc/sys/vm/drop_caches")
res[f,'slice'] = system.time(dummy <-, "tasAdjust", start=c(30, 30, 15), count=c(60, 60, 40)))[3]
file Name along.time along.land_id slice
../ 116.155 0.034 2.188 5.236 0.552 0.676 2.941 1.544 0.246 0.811 2.915 0.222 0.057 225.566 4.567 0.127 114.263 2.639
../ 51.292 0.290 0.213

As you can see the more the file chunks are ordered along the time dimension the faster the access is for a timeseries at one location (real time in seconds). However, only at the expense of the remaining dimensions. So one has to find an appropriate solution for the desired purpose. And it is still worth using the reduced 2-dimensional grid instead of the 3-dimensional in order to have smaller files.

More in the NetCDF documentation.


2 thoughts on “Accelerate NetCDF-4 read access

  1. Interesting. Another simple way is using the Perl Data Language, PDL. You can import an entire NetCDF file to a PDL “piddle” and then extract slices across dimensions very simply.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s