Accelerate NetCDF-4 read access

For simulating the (global) terrestrial biosphere I need time series of climate data, which is mostly provided in a 3-dimesional NetCDF file with the dimensions longitude, latitude and time. If everything is in a single file, this is often several GB large and extracting a timeseries along one position can take quiet some time, since by default NetCDF files are optimized to extract time slices. In the following I show how I speed up reading from a large NetCDF file along the time axis.

Continue reading “Accelerate NetCDF-4 read access”

Twitter friends and (un-)followers with R

A few weeks ago  the number of my twitter followers exceeded 100 and was going slightly down again and I was interested in who unfollowed me. Twitter does not offer that and I would have had to register at a third party page, which even wanted write access to my twitter account.  I did not think about it for a while and then I found the R package twitteR, which lets you fetch all required information and I used it in the following to evaluate my “friends” and “(un-)followers”.
Continue reading “Twitter friends and (un-)followers with R”

Broken axis with ggplot2

For visualizing my data I use R and the library ggplot2. And just lately I made some sensitivity simulations with out dynamic global vegetation model (DGVM) LPJ-GUESS. While summarizing the data per ecosystem and having a first look at the data I realized, that one ecosystem has up to 10 times higher values than all the others. That made me searching for “broken axis” and I didn’t find a satisfying solution, so I had to create my own.

Continue reading “Broken axis with ggplot2”

Retrieving and reprojecting MODIS satellite products

Data from the MODIS satellites are a great source for comparison to vegetation models and also for teaching. However, they come in a format I am not really used to and in a strange projection. The format is HDF, of which even several versions exists. But there are also several tools to convert it to a “better” file format for me. HDF needs much less space compared to other formats. Therefore it makes sense for satellite data to be stored as HDF.

First I need to know which data I want and in which region. Just recently a colleague asked me to get the fire product (MCD45A1) for Vietnam for him. Therefore I show it as example here.

On the map provided on the MODIS overview webpage one can try to select the correct row (v) and column (h) values. However, I already said it is a strange projection. I reprojected the grid to a standard equirectangular projection:

modis sinusoidal

That’s also not easy to select the right h/v values. However with both maps it worked. I set up a perl script using my module WebService::MODIS, which is also available on CPAN. Here is the example script, how to download the desired satellite data:

#!/usr/bin/env perl
use warnings;
use strict;
use WebService::MODIS;
readCache; # or "initCache;" if not done yet
my $fire = WebService::MODIS->new();
$fire->dates(["2007.01.01", "2014.12.31"]);
#print "$_\n" foreach $fire->url;

Now the files are in the current working directory. I do the mosaicing and reprojecting with ModisDownload.R. You may now ask: “Why not also downloading the data with the R Skript?” I did it in the beginning and it was very frustrating, since it crashed or did not start downloading at all and continued download was not possible. Therefore the perl module. To make use of different directories, where the downloaded files were and the reprojected files should be saved, I modified the original ModisDownload.R. Here is the diff of line 184 and 185 (Version 3.2, 13th Jan. 2014):

- write(paste(getwd(),"/",hdfNames[1], sep=""), mosaicname)
- for (j in 2:length(hdfNames)) write(paste(getwd(),"/",hdfNames[j], sep=""),mosaicname,append=T)
+ write(hdfNames[1], mosaicname)
+ for (j in 2:length(hdfNames)) write(hdfNames[j],mosaicname,append=T)

Now the filenames given to the mosaicHDF function must contain the full path to the files if called from another directory. ModisDownload.R uses the Modis Reprojection Tool (MRT), which is written in Java. Last year we used it for teaching. However, the installation under Windows it is a pain in the … Therefore, we preprocessed the data this year and supplied the students with GeoTiffs. Here is now the R script, which processes the downloaded hdf files and creates one GeoTiff per timeslice in UTM projection in the current working directory:

MRT.PATH <- "/path/to/MRT/bin"
MODISDOWNLOAD.DIR <- "/path/to/ModisDownload.R"
source(paste(MODISDOWNLOAD.DIR, "ModisDownload.R", sep="/"))
MODIS <- list(product="MCD45A1", bands="1 0 0 0 0 0 0 0", pixel.size=500)
DOWNLOAD.DIR <- "/path/to/downloaded/MODIS/files"
OUTPUT.DIR <- "/path/to/large/disk/for/GeoTIFFs"
downloaded <- list.files(DOWNLOAD.DIR, pattern="hdf$", full.names=TRUE)
if (length(downloaded) != 0) {
  yyyyddd <- sub("^.*\\.A", "", downloaded)
  yyyyddd <- unique(sub("\\..*$", "", yyyyddd))
  year <- trunc(as.integer(yyyyddd) / 1000)
  day.of.year <- as.integer(yyyyddd) - year * 1000
  dates <- as.Date(paste(year,"-01-01", sep=""), format="%Y-%m-%d") + day.of.year - 1
  for (j in 1:length(dates)) {
    in.hdf <- downloaded[grepl(paste("A", yyyyddd[j], sep=""), downloaded)]
    out.hdf <- paste(MODIS[["product"]], "_", dates[j], ".hdf", sep="")
    out.tif <- paste(MODIS[["product"]], "_", dates[j], ".tif", sep="")
    if (length(in.hdf)>1) {
      mosaicHDF(in.hdf, out.hdf, MRTpath=MRT.PATH)
    } else {
      file.symlink(in.hdf, out.hdf)
      proj_params="0 0 0 0 0 0 0 0 0 0 0 0 0 0 0",
    ## unlink(out.hdf)

Now I have a timeseries of Geotiffs in UTM projection I can work with.

Reading GPX tracks in R

I found another post on R-bloggers, which is very similar to my previous post. So I modified my script and it does not use the library plotKML anymore:


getTrack <- function(file, vars=c("time", "ele")) {
  pfile <- htmlTreeParse(file, error=function (...) {}, useInternalNodes=TRUE)

  trkpt <- xpathSApply(pfile, path = "//trkpt", xmlAttrs)
  create.df.str <- 'out.df <- data.frame(lon=as.numeric(trkpt["lon",]), lat=as.numeric(trkpt["lat",])'
  for (n in vars) {
    if (n=="ele") {
      ele <- as.numeric(xpathSApply(pfile, path = "//trkpt/ele", xmlValue))
      create.df.str <- paste(create.df.str, ", ele=ele", sep="")
    } else if (n=="time") {
      time <- xpathSApply(pfile, path = "//trkpt/time", xmlValue)
      time <- strptime(time, format = "%Y-%m-%dT%H:%M:%OS")
      create.df.str <- paste(create.df.str, ", time=time", sep="")
    } else {
      eval(parse(text=paste(n, ' <- xpathSApply(pfile, path = //trkpt/', n, ', xmlValue)', sep="")))
      create.df.str <- paste(create.df.str, ", ",n, "=", n, sep="")
  create.df.str <- paste(create.df.str, ")", sep="")

track <- getTrack("track.gpx")

track$delta.dist = 0
track$delta.dist[2:nrow(track)] = pointDistance(track[2:nrow(track), c("lon", "lat")], track[1:(nrow(track)-1), c("lon", "lat")], lonlat=TRUE)

track$delta.time = 0
track$delta.time[2:nrow(track)] = as.numeric(difftime(track$time[2:nrow(track)], track$time[1:(nrow(track)-1)], units="secs"))

track$speed = 3.6 * track$delta.dist / track$delta.time

Cycling to work and visualizing the track

Each spring I intent to cycle at least once a month from Mainz, where I live to work in Frankfurt. It’s a 43 km ride, mostly along the river Main. It is a nice route, especially in the morning, when I ride towards the sunrise. But you always have to pay attention to the rabbits, there are so many of them. In the morning, when it is still a bit dark I sometimes see them just in the last moment. Therefore I often ring my bell, so they hear me and run away. Of course there are many more animals to spot along the nice scenery. And even the urban and industrial parts of the tracks are kind of nice. As I already wrote the route goes along the river most of the time, only in Flörsheim, Okriftel and Höchst the route leads further away from the river.

The largest part is on asphalt, but there are also gravel and cobblestone sections, which is ok with my bike, but it might not really be suitable for racing bikes. There is another option further north, which is shorter but I am afraid it would be mostly on roads. Therefore I never tried it, but that alternative would be better for racing bikes.

Since they finished the new runway at the airport I would not like to live closer to Frankfurt. It is so loud, when the planes are descending over those villages alternating between old and new runway.

I recorded my track with a smartphone and with an old Garmin GPS device. The quality does not really differ much. Here is the track with some alternative shortcuts:

You can create and publish maps like this in Google Maps.

Getting the data

To read the data from my Garmin device I use gpsbabel. You have to make sure that the module garmin_gps is not loaded once the device is plugged in and turned on (“rmmod garmin_gps” as root or blacklist it, entry “blacklist garmin_gps” in a file in /etc/modprobe.d/; the file name is distribution dependent).

> gpsbabel -t -i garmin -f usb: -o gpx -F garmin_tracks.gpx

On the iPhone I use iSmoothRun, which exports the data to my dropbox in gpx format. For further visualization I use R with the libraries ggplot2, sp, raster and plotKML. The last one has a lot of dependencies.
You have to find out which of the recorded tracks from the Garmin device is the one to use, in my case it is number 5. If your track was too long and cut into pieces, you have to do it analogue to the iPhone GPX track.

garmin.gpx <- readGPX("garmin_tracks.gpx")
garmin.track <- garmin.gpx$tracks[[5]][[1]]
garmin.spdf <- SpatialPointsDataFrame(garmin.track[1:2], garmin.track[3:ncol(garmin.track)], proj4string=CRS("+proj=longlat +ellps=WGS84"))
iphone.gpx <- readGPX("ismoothrun_export.gpx")
iphone.track <- iphone.gpx$tracks[[1]][[2]]
for (i in 2:length(iphone.gpx$tracks[[1]])) {
  iphone.track <- rbind(iphone.track, iphone.gpx$tracks[[1]][[i]])
iphone.spdf <- SpatialPointsDataFrame(iphone.track[1:2], iphone.track[3:ncol(iphone.track)], proj4string=CRS("+proj=longlat +ellps=WGS84"))

I now have a SpatialPointsDataFrame in a longitude/latitude projection. To visualize anything in dependence on the distance cycled, I can reproject the data to UTM, calculate the distance between the points and append it as a new column to the data of the SpatialPointsDataFrame:

iphone.utm <- coordinates(spTransform(iphone.spdf, CRS=CRS("+proj=utm +units=km")))
garmin.utm <- coordinates(spTransform(garmin.spdf, CRS=CRS("+proj=utm +units=km")))
iphone.dist <- sqrt((iphone.utm[2:nrow(iphone.utm), 1] - iphone.utm[1:(nrow(iphone.utm)-1), 1])^2 +
  (iphone.utm[2:nrow(iphone.utm), 2] - iphone.utm[1:(nrow(iphone.utm)-1), 2])^2)
garmin.dist <- sqrt((garmin.utm[2:nrow(garmin.utm), 1] - garmin.utm[1:(nrow(garmin.utm)-1), 1])^2 +
  (garmin.utm[2:nrow(garmin.utm), 2] - garmin.utm[1:(nrow(garmin.utm)-1), 2])^2)
iphone.spdf@data$dist = 0
iphone.spdf@data$dist[2:nrow(iphone.spdf@data)] = cumsum(iphone.dist)
garmin.spdf@data$dist = 0
garmin.spdf@data$dist[2:nrow(garmin.spdf@data)] = cumsum(garmin.dist)

For what ever reason, the elevation is defined as character. Therefore it has to be converted to numeric first:

garmin.spdf@data$ele = as.numeric(garmin.spdf@data$ele)
iphone.spdf@data$ele = as.numeric(iphone.spdf@data$ele)

To make it understandable by ggplot2, it has to be converted to a combined simple data.frame:

garmin.df <- data.frame(garmin.spdf)
iphone.df <- data.frame(iphone.spdf)
combined <- data.frame(garmin.df, device="Garmin", stringsAsFactors=FALSE)
combined <- rbind(combined, data.frame(iphone.df, device="iPhone", stringsAsFactors=FALSE))
p <- ggplot(combined, aes(x=dist, y=ele, color=device))
p <- p + geom_line(size=1.5)
p <- p + xlab("Distance [km]") + ylab("Elevation [m]")

I would say both devices are not really good in this flat terrain. The steepest and highest elevation are the three bridges I cross over the Rhein and Main. And yes it goes more or less up all the time, but not as much as the Garmin measures, despite barometric device. However, I was curious how the date compares to a digital elevation model (DEM) and extracted the data along the track. I use the corresponding tile of the GTOPO30 dataset:

gtopo <- readGDAL("W020N90.DEM")
gtopo.iphone.spdf <- extract(raster(gtopo), iphone.spdf, sp=TRUE, fun=mean)
gtopo.iphone.df <- data.frame(gtopo.iphone.spdf)
p <- ggplot(combined, aes(x=dist, y=ele, color=device))
p <- p + geom_line(size=1.5)
p <- p + geom_line(data=gtopo.iphone.df, aes(y=band1), color="black", linetype=3, size=1.5, alpha=0.5)
p <- p+ xlab("Distance [km]") + ylab("Elevation [m]")

To my surprise the elevation recorded by the iPhone looks much better compared to the Garmin GPS in relation to the GTOPO30 data (dotted line), despite barometric altitude calculation.
One could now also reformat the time column, calculate speed, use it as axis, …

garmin.df$time <- strptime(garmin.df$time, format="%Y-%m-%dT%H:%M:%SZ",tz="CEST")
iphone.df$time <- strptime(iphone.df$time, format="%Y-%m-%dT%H:%M:%SZ",tz="CEST")


The problem with my plan cycling once a month is, I always have reasons not to cycle: the weather, it’s to cold, hot, raining; I have a slight cold; The planes a descending over Mainz, which means easterly winds; … So this year I made it only three times so far. However, hopefully Stadtradeln motivated me to cycle a bit more often now.

And also the it is a really nice scenery. Here are some impressions from my tours:

Autumn sunrise above the foggy alluvial meadow of the Main

View from the bridge in Sindlingen to both sides, even industry can be nice.

And some spiderwebs