Accelerate NetCDF-4 read access

For simulating the (global) terrestrial biosphere I need time series of climate data, which is mostly provided in a 3-dimesional NetCDF file with the dimensions longitude, latitude and time. If everything is in a single file, this is often several GB large and extracting a timeseries along one position can take quiet some time, since by default NetCDF files are optimized to extract time slices. In the following I show how I speed up reading from a large NetCDF file along the time axis.

Continue reading “Accelerate NetCDF-4 read access”


Simple greenhouse gas layer model

Since carbon dioxide in the atmosphere is still increasing and it seems as if it will not stop increasing in the near future, it is important to know about the greenhouse gas effect. On earth is it neither as cold as on Mars (-55°C) nor as hot as on Venus (>400°C). This has not only to do with the distance from sun, but also with the greenhouse gases. On Mars there is hardly any atmosphere, leading to warm temperatures during day and very cold temperatures during night. You might have experienced clear nights with a much larger temperature difference to daytime temperatures, whereas with a cloudy sky the temperature difference between day and night is not as large. This is because water is a very effective greenhouse gas. However, there are hardly any possibilities to alter the water content in the atmosphere, but we could reduce our CO2 emissions. On Venus on the other side the atmosphere contains a lot of CO2 and other gases leading to a very strong greenhouse gas effect.

Quiet a while ago I’ve written a simple program to calculate the greenhouse gas effect, and since I’m not really good in documenting my code the following text is an extended documentation of a simple greenhouse gas layer model written in R, which is based on chapter 3 “The layer model” of David Archer’s book “Global Warming: Understanding the Forecast” (2008) (web page, publisher). The book is easy to read and the formulas are even understandable by a non-physicist like me. In the following I will briefly explain the simple radiation budget of the earth.

Solar radiation

If the earth is in equilibrium, with respect to its radiation budget, the amount of energy reaching the earth must be equal the amount of energy leaving the earth for a given time period. Energy is measured in Joule and energy per time is called power and is measured in Watts, where Watts = Joule/second or in scientific notation W = J s-1. The energy source for the earth is the sun and it is possible, to calculate suns mean energy reaching the earth, the so called solar constant Isolar (it is not really a constant, but for simplicity it is assumed to be constant).

According to the Stefan-Boltzman law the power emitted by a so called “black body” per area in Watts per square meter is equal to the surface temperature to the power of 4 times the Stefan-Boltzman constant σ. With the average surface temperature of the sun in Kelvin Tsun = 5778 K and σ = 5.671*10-8W m-2 K-4 the sun emits σ*Tsun4 = 63.21 MW m-2. That’s really much. The sun has a radius of rsun = 696*106 m and therefore a surface of 4*π*r2 = 6.09*1018 m2. By multiplying these two large numbers, the power emitted per area and suns surface area, we end up with the total power emitted by the sun in all directions, which is 3.85*1026 W. In Wikipedia’s list of the largest power stations in the world the Three Gorges Dam with 2.5*109 W is the largest power station to date, which is still nothing compared to the sun.

Of course, not all this power is reaching the earth, the total power emitted by the sun must be divided by the surface of a sphere with the distance of the earth from the sun dsun-earth The distance is on average dsun-earth = 149.6*109 m. This is so far, that the radiuses of earth and sun can be neglected. According to the above already used formula to calculate the surface of a sphere, the solar constant calculated here is Isolar = 3.85*1026 W / (4*π*dsun-earth2) = 1368.123 W m-2.


Radiation budget without atmosphere

Everybody might have seen pictures of the earth and realized that some parts are bright and others are dark. This means a part of the light and therefore the energy from the sun is directly reflected back to space, without being taken up by the earth surface and converted to “heat” or infrared radiation. The fraction of back scattered light or energy is called albedo with the symbol α and is on average 33% of incoming solar radiation. As mentioned above, the average incoming power must be equal the outgoing power. Because the sun only shines on one side of the earth the solar constant must be multiplied by the surface of a circle, the cross section of the earth (Isolar*(1 – α)*π*rearth2), but the earth emits in all directions, therefore the outgoing power must be multiplied by the surface of a sphere and we assume that the earth also behaves like a “black body” (4*π*rearth2*σ*Tearth4) to calculate earth surface temperature without atmosphere. As already mentioned the incoming and outgoing radiation must be equal and by solving Isolar*(1 – α)*π*r2 = 4*π*rearth2*σ*Tearth4 it is possible to calculate earth surface temperature if it had no atmosphere: Tearth = [(1-α)*Isolar / (4*σ)]0.25 (where the power to 0.25 stand for the forth root). By filling in the above values the earth surface would have an average temperature of 252.13 K equal to -20°C, pretty cold. So luckily we have an atmosphere on earth not only to breath, but also to raise the temperature.

Greenhouse layer

If now the atmosphere comes in our model, the above radiation budget is valid for the top of the atmosphere. So incoming solar power must equal outgoing infrared radiation at the top of the atmosphere. We assume a very simple atmosphere, which is transparent to the incoming radiation, but absorbing and emitting all infrared radiation. However, it takes up infrared radiation from the earth but emits it on both sides, back to earth and to space. By solving a set of formulas the surface temperature below the greenhouse layer equal 20.25*252.13 K = 299.8 K, a much more comfortable temperature now. We could now add more and more greenhouse layers, each of them would increase the surface temperature by the factor of 20.25 = 1.189.

Dust layer

One could also assume a dusty atmosphere layer, which absorbs a fraction of sun’s radiation, and emits infrared radiation to all direction. Guess what happens. Of course the earth surface would be cooler. I don’t want to go into details here, if you want to know more, look at the code, read David Archer’s or another atmospheric physics book. I was not yet able to combine the two models the greenhouse and the dust layer. And I will not finish it.

Final remarks

In the real world things are not so simple. The solar constant is not equal through out the year and over geological time periods, since the distance to earth changes and there are times with more or less sun spots. The albedo is not the same everywhere, some clouds and ice are much more reflective than forests or the oceans. A fraction of the solar radiation energy is not converted to infrared radiation and reemitted, but used for evaporation and photosynthesis. The energy in the evaporated and transpired water vapor is transported elsewhere and released, when the water condenses and precipitates. We don’t have a closed greenhouse gas layer and no equally distributed dust layer in the atmosphere.

Most of these processes are implemented in global climate models in a simplified way, which we call parameterization. These models divide the earth surface in small squares and the atmosphere in boxes above these squares to calculate all known processes. The models are parameterized to our best knowledge to represent reality as good as possible. And these global climate models do not run on a single PC they need several tens to hundreds, so called clusters. However, the above described model is already good enough to yield an approximation of the greenhouse gas effect and reality (observed mean earth temperature: 295 K or 22°C) is in between the two calculated temperatures of 252 K and 300 K, somewhat closer to the single greenhouse layer model.

Global top soil sand content

Global soil data for vegetation modelling?

Quiet some time ago I uploaded some scripts to github, which import global soil data to Grass-GIS. In the following I will demonstrate the usage of the script with the Harmonized World Soil Database (HWSD) as example. At the end I will briefly discuss some of the data.

The soil data is stored in a MS Access database linked to a raster file. These datasets are freely available in the internet: DSMW, HWSD available through the geonetwork of the FAO and ISRIC-WISE on their webpage.

Grass-GIS is a very powerful tool. I use it mainly to convert geographical data, for calculations and for maps in scientific journals, p.e. the nitrogen and carbon fixation maps in Elbert et al. (2012) were processed and created in Grass-GIS.

My script will store the data in a SQLite data file linked to a Grass-GIS vector map. The individual layers can finally be exported to various other formats for example as input to dynamic global vegetation models.

To use the script you need Grass-GIS with a lat-lon location and 30 arc seconds resolution. Here is the output of the Grass-GIS command “g.region -p”, once the locations is created:

projection: 3 (Latitude-Longitude)
zone: 0
datum: wgs84
ellipsoid: wgs84
north: 90N
south: 90S
west: 180W
east: 180E
nsres: 0:00:30
ewres: 0:00:30
rows: 21600
cols: 43200
cells: 933120000

Another prerequisite is the linux package “mdb-tools” with the commands “mdb-export” and “mdb-schema” to handle the Access database and the linux package “sqlite3”. The following command are executed in the above Grass-GIS environment. First some environmental variables are set. The variable HWSDDIR must point to the directory, where the HWSD data was downloaded to. A temporary directory defined by the variable TMPDIR needs to be present for temporary files.

GISDB=`g.gisenv GISDBASE`/`g.gisenv LOCATION_NAME`/`g.gisenv MAPSET`/db.sqlite
TMPDIR=`g.gisenv GISDBASE`/`g.gisenv LOCATION_NAME`/tmp.$$

if [ -d $TMPDIR ]; then
  echo "Temporary \"$TMPDIR\"folder exists!"
  echo "Delete it first"
  exit 1
  mkdir $TMP_DIR

db.connect driver=sqlite database=${GISDB}

Then the underlying raster map is imported and converted to a vector map and connected to the database. The “v.clean” command cleans up the topology and speeds up further processing substantially. -o --overwrite input=${HWSDDIR}/hwsd.bil output=${rastBasemap} --overwrite input=${rastBasemap} output=${vectBasemap}_tmp feature=area
sqlite3 ${GISDB} "ALTER TABLE ${vectBasemap}_tmp ADD area REAL" option=area type=centroid units=k map=${vectBasemap}_tmp column=area

v.clean --overwrite input=${vectBasemap}_tmp output=${vectBasemap} tool=bpol,rmarea thresh=0.4,0.4
g.remove vect=${vectBasemap}_tmp

The following commands set up the tables in the database of all tables in the original soil data.

mdb-schema ${HWSDDIR}/HWSD.mdb mysql > ${TMPDIR}/HWSD_CREATE.sql
sqlite3 ${GISDB} < ${TMPDIR}/HWSD_CREATE.sql

The following loop imports all data into the just created tables

for table in `mdb-tables ${HWSDDIR}/HWSD.mdb`; do
  mdb-export -d '|' ${HWSDDIR}/HWSD.mdb $table \
    | sed -e '1d' \
    | sed -e 's/\"//g' > ${TMPDIR}/${table}.csv
  cat << EOF | sqlite3 ${GISDB}
.separator "|"
.import ${TMPDIR}/${table}.csv $table
  rm -f ${TMPDIR}/${table}.csv

The following loop joins the data of the soil mapping unit with the vector database:

newColNames=(`sqlite3 ${GISDB} '.schema HWSD_SMU' | sed -e '1,2d' | sed -e '$d' | sed -e 's/^\s*//' | sed -e 's/\s.*//' | sed -e 's/\`//g'`)
newColTypes=(`sqlite3 ${GISDB} '.schema HWSD_SMU' | sed -e '1,2d' | sed -e '$d' | sed -e 's/,\s//' | sed -e 's/.*\s\s//' | sed -e 's/\s//g'`)

cat /dev/null > ${TMPDIR}/join.sql
for ((i=0; i > ${TMPDIR}/join.sql
  echo "ALTER TABLE ${vectBasemap} ADD ${newColNames[$i]} ${newColTypes[$i]};" >> ${TMPDIR}/join.sql
  echo "UPDATE ${vectBasemap} SET ${newColNames[$i]}=(SELECT ${newColNames[$i]} FROM HWSD_SMU WHERE HWSD_SMU.MU_GLOBAL=${vectBasemap}.value);" >> ${TMPDIR}/join.sql
sqlite3 ${GISDB} < ${TMPDIR}/join.sql

Now the general data is in the Grass-GIS database and can be exported or displayed. For that I add a temporary column to the database of type integer. The map can either be used as raster or vector map. You can define the variable either as a single value as done here or you create a loop over some values to convert several soil type at once.

echo "ALTER TABLE ${vectBasemap} ADD isubset INTEGER;" | sqlite3 ${GISDB}
echo "UPDATE ${vectBasemap} SET isubset=0;" > new.sql
echo "UPDATE ${vectBasemap} SET isubset=1 WHERE SU_SYMBOL='${soil}';" >> new.sql
echo "UPDATE ${vectBasemap} SET isubset=0 WHERE isubset IS NULL;" >> new.sql
sqlite3 ${GISDB} < new.sql --quiet --overwrite input=${vectBasemap} output=SOIL_${soil} use=attr column=isubset
r.mapcalc "SOIL_tmp=if(SOIL_${soil}==1, 1, null())" --quiet --overwrite input=SOIL_tmp output=SOIL_${soil} feature=area

There is an additional information in the table HWSD_DATA, which is the raw material the SU_SYMBOL was made of. That is the data derived from different other sources like the Digital Soil Map of the World (DSMW), Chinese, European soil data and SOTER. However, usage of these soil parameters, including physical and chemical properties need to be taken with caution. Especially across the border of the metioned data sources as I will demonstate.

Allthough the HWSD allready has a very high resolution of 30 arc seconds, which is slightly less than a kilometer at the equator, it can consist of 10 different soil classes within one gridcell given by the column SEQ (counter) and SHARE (percent coverage) in the table HWSD_DATA. The SHARE goes even down to 1%, meaning a soil definition of less than one hectare on a global scale.

echo "ALTER TABLE ${vectBasemap} ADD rsubset REAL;" | sqlite3 ${GISDB}
echo "UPDATE ${vectBasemap} SET rsubset=0;" > new.sql
echo "UPDATE ${vectBasemap} SET rsubset=(SELECT SHARE FROM HWSD_DATA WHERE HWSD_DATA.MU_GLOBAL=${vectBasemap}.MU_GLOBAL AND (HWSD_DATA.SU_SYM74 LIKE '${soil}%' OR HWSD_DATA.SU_SYM85 LIKE '${soil}%' OR HWSD_DATA.SU_SYM90 LIKE '${soil}%'));" >> new.sql
echo "UPDATE ${vectBasemap} SET rsubset=0 WHERE rsubset IS NULL;" >> new.sql
sqlite3 ${GISDB} < new.sql --quiet --overwrite input=${vctBasemap} output=SHARE_${soil} type=area use=attr column=rsubset

I calculated the total area of three soil types (Arenosols, Solonetz and Solonchack) and dune sand based on the fractional coverage as well as the dominant soil type:

dominant fractional
Arenosol 957 442
Dune sand 357 280
Solonchack 128 88
Solonetz 204 140

And another drawback comes up, once you look at the distribution of the soils. The color code in the following map is the fractional coverage and the black lines are around regions, where the soil type is defined as dominant. The Arenosol (upper panel) stops at the Egyptian border and is dune sand (lower panel) in Lybia. That’s because at the Arenosols were not defined in the DSMW, which is the source for the soil definition in most of the Sahara region.


And this brings me to my next criticism. If you properly extract the high resolution texture (in the following I will concentrate on sand only). And caculate a wheighted mean of all subsoil types per pixel, you also see the artefact of the different source databases:

Those pixel fractions defined as dune sand in the DSMW source regions didn’t even get any texture information in the HWSD_DATA table. However, the mean sand content of all Arenosols is 90% and if I give the dune sand this texture property and recalculate the sand content my map looks much better:

My overall conclusion hereafter is, using the HWSD as input for global applications is much to complicated and not straight forward. If you want to have a consistent dataset of global soil properties one should stick to the older DSMW or IRIC-WISE data. Nevertheless, I am also looking forward to soilgrids a new dataset of global soil properties.