Cycling to work and visualizing the track

Each spring I intent to cycle at least once a month from Mainz, where I live to work in Frankfurt. It’s a 43 km ride, mostly along the river Main. It is a nice route, especially in the morning, when I ride towards the sunrise. But you always have to pay attention to the rabbits, there are so many of them. In the morning, when it is still a bit dark I sometimes see them just in the last moment. Therefore I often ring my bell, so they hear me and run away. Of course there are many more animals to spot along the nice scenery. And even the urban and industrial parts of the tracks are kind of nice. As I already wrote the route goes along the river most of the time, only in Flörsheim, Okriftel and Höchst the route leads further away from the river.

The largest part is on asphalt, but there are also gravel and cobblestone sections, which is ok with my bike, but it might not really be suitable for racing bikes. There is another option further north, which is shorter but I am afraid it would be mostly on roads. Therefore I never tried it, but that alternative would be better for racing bikes.

Since they finished the new runway at the airport I would not like to live closer to Frankfurt. It is so loud, when the planes are descending over those villages alternating between old and new runway.

I recorded my track with a smartphone and with an old Garmin GPS device. The quality does not really differ much. Here is the track with some alternative shortcuts:

You can create and publish maps like this in Google Maps.

Getting the data

To read the data from my Garmin device I use gpsbabel. You have to make sure that the module garmin_gps is not loaded once the device is plugged in and turned on (“rmmod garmin_gps” as root or blacklist it, entry “blacklist garmin_gps” in a file in /etc/modprobe.d/; the file name is distribution dependent).

> gpsbabel -t -i garmin -f usb: -o gpx -F garmin_tracks.gpx

On the iPhone I use iSmoothRun, which exports the data to my dropbox in gpx format. For further visualization I use R with the libraries ggplot2, sp, raster and plotKML. The last one has a lot of dependencies.
You have to find out which of the recorded tracks from the Garmin device is the one to use, in my case it is number 5. If your track was too long and cut into pieces, you have to do it analogue to the iPhone GPX track.

library(ggplot2)
library(sp)
library(plotKML)
garmin.gpx <- readGPX("garmin_tracks.gpx")
garmin.track <- garmin.gpx$tracks[[5]][[1]]
garmin.spdf <- SpatialPointsDataFrame(garmin.track[1:2], garmin.track[3:ncol(garmin.track)], proj4string=CRS("+proj=longlat +ellps=WGS84"))
iphone.gpx <- readGPX("ismoothrun_export.gpx")
iphone.track <- iphone.gpx$tracks[[1]][[2]]
for (i in 2:length(iphone.gpx$tracks[[1]])) {
  iphone.track <- rbind(iphone.track, iphone.gpx$tracks[[1]][[i]])
}
iphone.spdf <- SpatialPointsDataFrame(iphone.track[1:2], iphone.track[3:ncol(iphone.track)], proj4string=CRS("+proj=longlat +ellps=WGS84"))

I now have a SpatialPointsDataFrame in a longitude/latitude projection. To visualize anything in dependence on the distance cycled, I can reproject the data to UTM, calculate the distance between the points and append it as a new column to the data of the SpatialPointsDataFrame:

iphone.utm <- coordinates(spTransform(iphone.spdf, CRS=CRS("+proj=utm +units=km")))
garmin.utm <- coordinates(spTransform(garmin.spdf, CRS=CRS("+proj=utm +units=km")))
iphone.dist <- sqrt((iphone.utm[2:nrow(iphone.utm), 1] - iphone.utm[1:(nrow(iphone.utm)-1), 1])^2 +
  (iphone.utm[2:nrow(iphone.utm), 2] - iphone.utm[1:(nrow(iphone.utm)-1), 2])^2)
garmin.dist <- sqrt((garmin.utm[2:nrow(garmin.utm), 1] - garmin.utm[1:(nrow(garmin.utm)-1), 1])^2 +
  (garmin.utm[2:nrow(garmin.utm), 2] - garmin.utm[1:(nrow(garmin.utm)-1), 2])^2)
iphone.spdf@data$dist = 0
iphone.spdf@data$dist[2:nrow(iphone.spdf@data)] = cumsum(iphone.dist)
garmin.spdf@data$dist = 0
garmin.spdf@data$dist[2:nrow(garmin.spdf@data)] = cumsum(garmin.dist)

For what ever reason, the elevation is defined as character. Therefore it has to be converted to numeric first:

garmin.spdf@data$ele = as.numeric(garmin.spdf@data$ele)
iphone.spdf@data$ele = as.numeric(iphone.spdf@data$ele)

To make it understandable by ggplot2, it has to be converted to a combined simple data.frame:

garmin.df <- data.frame(garmin.spdf)
iphone.df <- data.frame(iphone.spdf)
combined <- data.frame(garmin.df, device="Garmin", stringsAsFactors=FALSE)
combined <- rbind(combined, data.frame(iphone.df, device="iPhone", stringsAsFactors=FALSE))
p <- ggplot(combined, aes(x=dist, y=ele, color=device))
p <- p + geom_line(size=1.5)
p <- p + xlab("Distance [km]") + ylab("Elevation [m]")
print(p)

track_elevation
I would say both devices are not really good in this flat terrain. The steepest and highest elevation are the three bridges I cross over the Rhein and Main. And yes it goes more or less up all the time, but not as much as the Garmin measures, despite barometric device. However, I was curious how the date compares to a digital elevation model (DEM) and extracted the data along the track. I use the corresponding tile of the GTOPO30 dataset:

library(raster)
library(rgdal)
gtopo <- readGDAL("W020N90.DEM")
gtopo.iphone.spdf <- extract(raster(gtopo), iphone.spdf, sp=TRUE, fun=mean)
gtopo.iphone.df <- data.frame(gtopo.iphone.spdf)
p <- ggplot(combined, aes(x=dist, y=ele, color=device))
p <- p + geom_line(size=1.5)
p <- p + geom_line(data=gtopo.iphone.df, aes(y=band1), color="black", linetype=3, size=1.5, alpha=0.5)
p <- p+ xlab("Distance [km]") + ylab("Elevation [m]")
print(p)

To my surprise the elevation recorded by the iPhone looks much better compared to the Garmin GPS in relation to the GTOPO30 data (dotted line), despite barometric altitude calculation.
track_elevation_gtopo
One could now also reformat the time column, calculate speed, use it as axis, …

garmin.df$time <- strptime(garmin.df$time, format="%Y-%m-%dT%H:%M:%SZ",tz="CEST")
iphone.df$time <- strptime(iphone.df$time, format="%Y-%m-%dT%H:%M:%SZ",tz="CEST")

Remarks

The problem with my plan cycling once a month is, I always have reasons not to cycle: the weather, it’s to cold, hot, raining; I have a slight cold; The planes a descending over Mainz, which means easterly winds; … So this year I made it only three times so far. However, hopefully Stadtradeln motivated me to cycle a bit more often now.

And also the it is a really nice scenery. Here are some impressions from my tours:

Autumn sunrise above the foggy alluvial meadow of the Main
  

View from the bridge in Sindlingen to both sides, even industry can be nice.



And some spiderwebs


Advertisements

Another workflow to the previous panorama photography

In my previous post I processed a exposure series and combined them. That was how I once did it and thought that is easy and effective enough, but without really looking for alternatives so far. After publishing the post and being inspired by my father’s comments and a blog post by Gregory Long to use RAW-editors, I started looking for alternatives. The two options I saw for Open Source Software under Linux were rawtherapee and darktable. After first attempts with both of them I decided to use darktable although rawtherapee had predefined profiles like “Natural 1” or predefined white balances like “daylight” and much more user friendly stuff. I found one “first steps” tutorial (in german) for darktable and darktable had one option which really convinced me and that was the “lens correction”. This feature flattened the horizon and I could combine the Panorama picture in Hugin. I applied some more filters and came up with the following picture:

Projection: Rectilinear (0) FOV: 95 x 65 Ev: 12.71
Projection: Rectilinear (0)
FOV: 95 x 65
Ev: 12.71

The foreground is darker compared to the HDR picture. Nevertheless, this here uses one application less and therefore saves some time. And I now know a little about RAW-Editors and how to apply them. I think, that’s now a matter of taste which picture you like more.

Creating a HDR panorama picture

For holiday I’ve been to the North Sea this year and of course, like anybody else there too, I had to take a sunset picture. However, I didn’t want it to be too easy. Therefore I took two series of pictures with multiple exposure (-1, 0, 1 EV), which overlapped approximately by half. I do not own a high quality DLR camera, just a good compact camera. In the following I will describe, what I did to create a panorama of these pictures and which problems I experienced. Here are the initial pictures of the sunset:

hdr_pano_original

Combining the multiple exposed pictures

First of all I created the HDR pictures with Luminance HDR. When you create a new HDR picture you are prompted for the pictures and have the options to “Autoalign images” and “Auto-crop” them, which is a good choice if you take the pictures without a tripod like me. I followed more or less the description given here to create the HDR pictures and will briefly describe it:

After the alignment finished, I chose “profile 5” and created three different HDR pictures.

  1. I immediately saved the HDR image preview (“Save HDR image preview” option in the File menu). You can see the picture in the first column/row below.
  2. The next picture is with the Ashikhmin algorithm with equation 4 and “Local contrast Threshold” set to 0 (second column).
  3. The last picture is the default Mantiuk ’06 algorithm (last column)

I repeated this for the second series of pictures (second row).

hdr_pano_original_ldr

Next step is combining these three pictures. As you can see above some algorithms are good in capturing the contrast, but are bad in capturing the colors and vice versa. For the further image manipulation I use Gimp. I overlayed the “preview” image with the Ashikhmin” with 80% opacity in Mode “Multiply” and on top with the “Mantiuk ’06” picture at 30% opacity also in multiply mode. Then I flattened the image and went on to create the panorama.

Panorama creation

I used Hugin to create the panorama. Since all the camera meta data was lost in the meantime I had to get the “Focal length” and its “multiplier” from one of the original pictures and filled it in by hand, when the pictures were loaded. Then I could let Hugin do all the job and ended up with the following picture:

hdr_pano_bended_lres

That looks odd! There are two hills in the flat ocean! That must have something to do with my camera. So far I created panorama pictures only in rough terrain, but never where the horizon should be absolutely flat. Therefore I had to use Gimp again to straighten the horizon. The filter “Lens distortion” in the “Distorts” sub-menu did the job sufficiently. I adjusted the “Main” and “Zoom” values until the horizon was slightly bent to the other direction and I had the following pictures:

hdr_corrected

And I ended up with this beautiful HDR panorama picture:

hdr_pano_lres

That’s almost perfect! The horizon is not yet absolutely straight, but taking into account that this was relatively quick the result looks sufficient. And of course it can never replace viewing the original sunset.

Simple greenhouse gas layer model

Since carbon dioxide in the atmosphere is still increasing and it seems as if it will not stop increasing in the near future, it is important to know about the greenhouse gas effect. On earth is it neither as cold as on Mars (-55°C) nor as hot as on Venus (>400°C). This has not only to do with the distance from sun, but also with the greenhouse gases. On Mars there is hardly any atmosphere, leading to warm temperatures during day and very cold temperatures during night. You might have experienced clear nights with a much larger temperature difference to daytime temperatures, whereas with a cloudy sky the temperature difference between day and night is not as large. This is because water is a very effective greenhouse gas. However, there are hardly any possibilities to alter the water content in the atmosphere, but we could reduce our CO2 emissions. On Venus on the other side the atmosphere contains a lot of CO2 and other gases leading to a very strong greenhouse gas effect.

Quiet a while ago I’ve written a simple program to calculate the greenhouse gas effect, and since I’m not really good in documenting my code the following text is an extended documentation of a simple greenhouse gas layer model written in R, which is based on chapter 3 “The layer model” of David Archer’s book “Global Warming: Understanding the Forecast” (2008) (web page, publisher). The book is easy to read and the formulas are even understandable by a non-physicist like me. In the following I will briefly explain the simple radiation budget of the earth.

Solar radiation

If the earth is in equilibrium, with respect to its radiation budget, the amount of energy reaching the earth must be equal the amount of energy leaving the earth for a given time period. Energy is measured in Joule and energy per time is called power and is measured in Watts, where Watts = Joule/second or in scientific notation W = J s-1. The energy source for the earth is the sun and it is possible, to calculate suns mean energy reaching the earth, the so called solar constant Isolar (it is not really a constant, but for simplicity it is assumed to be constant).

According to the Stefan-Boltzman law the power emitted by a so called “black body” per area in Watts per square meter is equal to the surface temperature to the power of 4 times the Stefan-Boltzman constant σ. With the average surface temperature of the sun in Kelvin Tsun = 5778 K and σ = 5.671*10-8W m-2 K-4 the sun emits σ*Tsun4 = 63.21 MW m-2. That’s really much. The sun has a radius of rsun = 696*106 m and therefore a surface of 4*π*r2 = 6.09*1018 m2. By multiplying these two large numbers, the power emitted per area and suns surface area, we end up with the total power emitted by the sun in all directions, which is 3.85*1026 W. In Wikipedia’s list of the largest power stations in the world the Three Gorges Dam with 2.5*109 W is the largest power station to date, which is still nothing compared to the sun.

Of course, not all this power is reaching the earth, the total power emitted by the sun must be divided by the surface of a sphere with the distance of the earth from the sun dsun-earth The distance is on average dsun-earth = 149.6*109 m. This is so far, that the radiuses of earth and sun can be neglected. According to the above already used formula to calculate the surface of a sphere, the solar constant calculated here is Isolar = 3.85*1026 W / (4*π*dsun-earth2) = 1368.123 W m-2.

sun_earth_radiation

Radiation budget without atmosphere

Everybody might have seen pictures of the earth and realized that some parts are bright and others are dark. This means a part of the light and therefore the energy from the sun is directly reflected back to space, without being taken up by the earth surface and converted to “heat” or infrared radiation. The fraction of back scattered light or energy is called albedo with the symbol α and is on average 33% of incoming solar radiation. As mentioned above, the average incoming power must be equal the outgoing power. Because the sun only shines on one side of the earth the solar constant must be multiplied by the surface of a circle, the cross section of the earth (Isolar*(1 – α)*π*rearth2), but the earth emits in all directions, therefore the outgoing power must be multiplied by the surface of a sphere and we assume that the earth also behaves like a “black body” (4*π*rearth2*σ*Tearth4) to calculate earth surface temperature without atmosphere. As already mentioned the incoming and outgoing radiation must be equal and by solving Isolar*(1 – α)*π*r2 = 4*π*rearth2*σ*Tearth4 it is possible to calculate earth surface temperature if it had no atmosphere: Tearth = [(1-α)*Isolar / (4*σ)]0.25 (where the power to 0.25 stand for the forth root). By filling in the above values the earth surface would have an average temperature of 252.13 K equal to -20°C, pretty cold. So luckily we have an atmosphere on earth not only to breath, but also to raise the temperature.

Greenhouse layer

If now the atmosphere comes in our model, the above radiation budget is valid for the top of the atmosphere. So incoming solar power must equal outgoing infrared radiation at the top of the atmosphere. We assume a very simple atmosphere, which is transparent to the incoming radiation, but absorbing and emitting all infrared radiation. However, it takes up infrared radiation from the earth but emits it on both sides, back to earth and to space. By solving a set of formulas the surface temperature below the greenhouse layer equal 20.25*252.13 K = 299.8 K, a much more comfortable temperature now. We could now add more and more greenhouse layers, each of them would increase the surface temperature by the factor of 20.25 = 1.189.

Dust layer

One could also assume a dusty atmosphere layer, which absorbs a fraction of sun’s radiation, and emits infrared radiation to all direction. Guess what happens. Of course the earth surface would be cooler. I don’t want to go into details here, if you want to know more, look at the code, read David Archer’s or another atmospheric physics book. I was not yet able to combine the two models the greenhouse and the dust layer. And I will not finish it.

Final remarks

In the real world things are not so simple. The solar constant is not equal through out the year and over geological time periods, since the distance to earth changes and there are times with more or less sun spots. The albedo is not the same everywhere, some clouds and ice are much more reflective than forests or the oceans. A fraction of the solar radiation energy is not converted to infrared radiation and reemitted, but used for evaporation and photosynthesis. The energy in the evaporated and transpired water vapor is transported elsewhere and released, when the water condenses and precipitates. We don’t have a closed greenhouse gas layer and no equally distributed dust layer in the atmosphere.

Most of these processes are implemented in global climate models in a simplified way, which we call parameterization. These models divide the earth surface in small squares and the atmosphere in boxes above these squares to calculate all known processes. The models are parameterized to our best knowledge to represent reality as good as possible. And these global climate models do not run on a single PC they need several tens to hundreds, so called clusters. However, the above described model is already good enough to yield an approximation of the greenhouse gas effect and reality (observed mean earth temperature: 295 K or 22°C) is in between the two calculated temperatures of 252 K and 300 K, somewhat closer to the single greenhouse layer model.

Downloading MODIS data with WebService::MODIS

For teaching at the University of Frankfurt about “Remote sensing in global ecology” with R together with a collegue we need a lot of different MODIS products for several regions in the world. Last year we used ModisDownload.R. It is a great tool, since it can download the data and if Modis Reprojection Tool (MRT) is installed, it merges several tiles together and transforms them to other geograpgical projections. However, it was a bit frustrating since it stopped downloading quiet often and does not support continued downloads. So I decided to write my own routine for downloading the MODIS products. For unexperienced users, like our students last year, it is quiet difficult to install and run MRT under windows, therefore we will supply them with ready made GeoTiffs this year. For this preprocessing I still use ModisDownload.R as an interface to MRT.

After downloading and installing the module from CPAN and installing it and its prerequisites with the normal

perl Makefile.PL
make
make test
make install

you can choose, which MODIS product for which regions you need and start writing the perl program for downloading the data. In the Perl program you then need to load the module with

use WebService::MODIS;

What can my module do?

The module first needs to retrieve the directory structure of the sever, where the MODIS data resides. this is done with the function

initCache();

This takes a few seconds to minutes. If you want, you can write the cached server structure to disk, for future usage, since the data is lost once you close the current perl process. The following function writes the cache to disk:

writeCache();

And this function is already one of the bugs I am aware of, since so far it only works under Linux. I did not know which environmental variable I could use for userspace configuration files under MacOS or even Windows. I happy if someone could help me here.

Once you start a new perl process, you can load the data from the saved configuration files with

readCache();

Optionally you can supply the writeCache and readCache functions with your own path, if you’re not happy with the default one.

If you know, which MODIS porducts, which tiles and for which time period you want the files, you can now either create a download list for usage in other download managers or use the build in download function. Two options exist, to create a new object:


my $lct = WebService::MODIS->new(product => "MCD12Q1",
dates => ['2001-01-01', '2010-01-01'], ifExactDates => 1,
h => [11,12], v=> [9,10]);

or

my $lct = WebService::MODIS->new();
$lct->product("MCD12Q1");
$lct->h([11,12]);
$lct->v([9,10]);
$lct->dates(["2001-01-01", "2010-01--01"]);
$lct->ifExactDates(1);

creates an object of the MODIS landcover product MCD12Q1 covering Rondonia, Brazil for the two years 2001 and 2010. This does not yet contain the details about the URLs. For that the Server needs to be contacted again with

$lct->createUrl;

Now the link list is for the $lct object is available and can be used either to print the list of URLs

print "$_\n" foreach $lct->url;

or to download the data (>700MB!):

$lct->download();

Where the download function can take the destination path as an optional argument.

About my first experience with “The Comprehensive Perl Archive Network” (CPAN)

After reading two other webpages I got a rough picture of what I have to do to publish a module on CPAN. I used

h2xs -AXc -n ModisDownload

for creating a skeleton of the module, then I filled it with my code. As you can see, I initially called my module “ModisDownload”, like the R library I got inspired by. What is really motivating to write a documentation is the default one, which would not have been so nice for me, if I left it unedited.

After that I applied for a Perl Authors Upload Server (PAUSE) account, which I got really fast with the hint to first publish my module on PrePAN, a prerelease server, where experienced CPAN users should help improve other modules. I got one comment, to rename my module and that the API is not very “perlish”. First comment ok, was applied quiet fast. The second part is not very constructive. However, I tried my best, to make my API more perlish by giving it an object oriented design. Constructive comments for further improvements are welcome. Especially the cache stuff still has the old unperlish(?) API.

Now I finally upladed the module to CPAN. The module is first sent out to “CPAN Testers”, a group of volunteers for testing new modules. The next day I got the report from these testers and what the f…, all tests had status “FAIL”. I now ran “make test” the first time myself I must shameful say and it failed. It still tried to load the module “ModisDownload”, but since I renamed the module to “WebService::MODIS” it had to fail. I fixed it and extented the test a little and tested the test myself, before I submitted a new version of my module to CPAN.

Now I hope there a not that many bugs in the code anymore and I might extend the funtionality somewhat, that GDAL can be used for transformation or selecting the horizontal and vertical sinusoidal numbers by latitude and longitude and if I get other suggestion I can see what is possible.

Two hours later: In the meantime it became already version number 1.2. When I look at the public CPAN page, I realized, the dependencies to other modules where missing. Therefore I added these missing dependencies to the Makefile.PL, which must be made by hand.

Global top soil sand content

Global soil data for vegetation modelling?

Quiet some time ago I uploaded some scripts to github, which import global soil data to Grass-GIS. In the following I will demonstrate the usage of the script with the Harmonized World Soil Database (HWSD) as example. At the end I will briefly discuss some of the data.

The soil data is stored in a MS Access database linked to a raster file. These datasets are freely available in the internet: DSMW, HWSD available through the geonetwork of the FAO and ISRIC-WISE on their webpage.

Grass-GIS is a very powerful tool. I use it mainly to convert geographical data, for calculations and for maps in scientific journals, p.e. the nitrogen and carbon fixation maps in Elbert et al. (2012) were processed and created in Grass-GIS.

My script will store the data in a SQLite data file linked to a Grass-GIS vector map. The individual layers can finally be exported to various other formats for example as input to dynamic global vegetation models.

To use the script you need Grass-GIS with a lat-lon location and 30 arc seconds resolution. Here is the output of the Grass-GIS command “g.region -p”, once the locations is created:

projection: 3 (Latitude-Longitude)
zone: 0
datum: wgs84
ellipsoid: wgs84
north: 90N
south: 90S
west: 180W
east: 180E
nsres: 0:00:30
ewres: 0:00:30
rows: 21600
cols: 43200
cells: 933120000

Another prerequisite is the linux package “mdb-tools” with the commands “mdb-export” and “mdb-schema” to handle the Access database and the linux package “sqlite3”. The following command are executed in the above Grass-GIS environment. First some environmental variables are set. The variable HWSDDIR must point to the directory, where the HWSD data was downloaded to. A temporary directory defined by the variable TMPDIR needs to be present for temporary files.

rastBasemap="HWSD_basemap"
vectBasemap="HWSD_basemap"
GISDB=`g.gisenv GISDBASE`/`g.gisenv LOCATION_NAME`/`g.gisenv MAPSET`/db.sqlite
HWSDDIR=/data/external/global/Soil/HWSD
TMPDIR=`g.gisenv GISDBASE`/`g.gisenv LOCATION_NAME`/tmp.$$

if [ -d $TMPDIR ]; then
  echo "Temporary \"$TMPDIR\"folder exists!"
  echo "Delete it first"
  exit 1
else
  mkdir $TMP_DIR
fi

db.connect driver=sqlite database=${GISDB}

Then the underlying raster map is imported and converted to a vector map and connected to the database. The “v.clean” command cleans up the topology and speeds up further processing substantially.

r.in.gdal -o --overwrite input=${HWSDDIR}/hwsd.bil output=${rastBasemap}
r.to.vect --overwrite input=${rastBasemap} output=${vectBasemap}_tmp feature=area
sqlite3 ${GISDB} "ALTER TABLE ${vectBasemap}_tmp ADD area REAL"
v.to.db option=area type=centroid units=k map=${vectBasemap}_tmp column=area

v.clean --overwrite input=${vectBasemap}_tmp output=${vectBasemap} tool=bpol,rmarea thresh=0.4,0.4
g.remove vect=${vectBasemap}_tmp

The following commands set up the tables in the database of all tables in the original soil data.

mdb-schema ${HWSDDIR}/HWSD.mdb mysql > ${TMPDIR}/HWSD_CREATE.sql
sqlite3 ${GISDB} < ${TMPDIR}/HWSD_CREATE.sql

The following loop imports all data into the just created tables

for table in `mdb-tables ${HWSDDIR}/HWSD.mdb`; do
  mdb-export -d '|' ${HWSDDIR}/HWSD.mdb $table \
    | sed -e '1d' \
    | sed -e 's/\"//g' > ${TMPDIR}/${table}.csv
  cat << EOF | sqlite3 ${GISDB}
.separator "|"
.import ${TMPDIR}/${table}.csv $table
.quit
EOF
  rm -f ${TMPDIR}/${table}.csv
done

The following loop joins the data of the soil mapping unit with the vector database:

newColNames=(`sqlite3 ${GISDB} '.schema HWSD_SMU' | sed -e '1,2d' | sed -e '$d' | sed -e 's/^\s*//' | sed -e 's/\s.*//' | sed -e 's/\`//g'`)
newColTypes=(`sqlite3 ${GISDB} '.schema HWSD_SMU' | sed -e '1,2d' | sed -e '$d' | sed -e 's/,\s//' | sed -e 's/.*\s\s//' | sed -e 's/\s//g'`)
NnewCols=${#newColNames[*]}

cat /dev/null > ${TMPDIR}/join.sql
for ((i=0; i > ${TMPDIR}/join.sql
  echo "ALTER TABLE ${vectBasemap} ADD ${newColNames[$i]} ${newColTypes[$i]};" >> ${TMPDIR}/join.sql
  echo "UPDATE ${vectBasemap} SET ${newColNames[$i]}=(SELECT ${newColNames[$i]} FROM HWSD_SMU WHERE HWSD_SMU.MU_GLOBAL=${vectBasemap}.value);" >> ${TMPDIR}/join.sql
done
sqlite3 ${GISDB} < ${TMPDIR}/join.sql

Now the general data is in the Grass-GIS database and can be exported or displayed. For that I add a temporary column to the database of type integer. The map can either be used as raster or vector map. You can define the variable either as a single value as done here or you create a loop over some values to convert several soil type at once.

soil="AR"
echo "ALTER TABLE ${vectBasemap} ADD isubset INTEGER;" | sqlite3 ${GISDB}
echo "UPDATE ${vectBasemap} SET isubset=0;" > new.sql
echo "UPDATE ${vectBasemap} SET isubset=1 WHERE SU_SYMBOL='${soil}';" >> new.sql
echo "UPDATE ${vectBasemap} SET isubset=0 WHERE isubset IS NULL;" >> new.sql
sqlite3 ${GISDB} < new.sql
v.to.rast --quiet --overwrite input=${vectBasemap} output=SOIL_${soil} use=attr column=isubset
r.mapcalc "SOIL_tmp=if(SOIL_${soil}==1, 1, null())"
r.to.vect --quiet --overwrite input=SOIL_tmp output=SOIL_${soil} feature=area

There is an additional information in the table HWSD_DATA, which is the raw material the SU_SYMBOL was made of. That is the data derived from different other sources like the Digital Soil Map of the World (DSMW), Chinese, European soil data and SOTER. However, usage of these soil parameters, including physical and chemical properties need to be taken with caution. Especially across the border of the metioned data sources as I will demonstate.

Allthough the HWSD allready has a very high resolution of 30 arc seconds, which is slightly less than a kilometer at the equator, it can consist of 10 different soil classes within one gridcell given by the column SEQ (counter) and SHARE (percent coverage) in the table HWSD_DATA. The SHARE goes even down to 1%, meaning a soil definition of less than one hectare on a global scale.

echo "ALTER TABLE ${vectBasemap} ADD rsubset REAL;" | sqlite3 ${GISDB}
echo "UPDATE ${vectBasemap} SET rsubset=0;" > new.sql
echo "UPDATE ${vectBasemap} SET rsubset=(SELECT SHARE FROM HWSD_DATA WHERE HWSD_DATA.MU_GLOBAL=${vectBasemap}.MU_GLOBAL AND (HWSD_DATA.SU_SYM74 LIKE '${soil}%' OR HWSD_DATA.SU_SYM85 LIKE '${soil}%' OR HWSD_DATA.SU_SYM90 LIKE '${soil}%'));" >> new.sql
echo "UPDATE ${vectBasemap} SET rsubset=0 WHERE rsubset IS NULL;" >> new.sql
sqlite3 ${GISDB} < new.sql
v.to.rast --quiet --overwrite input=${vctBasemap} output=SHARE_${soil} type=area use=attr column=rsubset

I calculated the total area of three soil types (Arenosols, Solonetz and Solonchack) and dune sand based on the fractional coverage as well as the dominant soil type:

dominant fractional
Arenosol 957 442
Dune sand 357 280
Solonchack 128 88
Solonetz 204 140

And another drawback comes up, once you look at the distribution of the soils. The color code in the following map is the fractional coverage and the black lines are around regions, where the soil type is defined as dominant. The Arenosol (upper panel) stops at the Egyptian border and is dune sand (lower panel) in Lybia. That’s because at the Arenosols were not defined in the DSMW, which is the source for the soil definition in most of the Sahara region.

AR_DS

And this brings me to my next criticism. If you properly extract the high resolution texture (in the following I will concentrate on sand only). And caculate a wheighted mean of all subsoil types per pixel, you also see the artefact of the different source databases:
T_SAND_mn

Those pixel fractions defined as dune sand in the DSMW source regions didn’t even get any texture information in the HWSD_DATA table. However, the mean sand content of all Arenosols is 90% and if I give the dune sand this texture property and recalculate the sand content my map looks much better:
T_SAND

My overall conclusion hereafter is, using the HWSD as input for global applications is much to complicated and not straight forward. If you want to have a consistent dataset of global soil properties one should stick to the older DSMW or IRIC-WISE data. Nevertheless, I am also looking forward to soilgrids a new dataset of global soil properties.

Visualizing failed SSH login attempts

Shortly after Christmas I opened the ssh port 22 at home to access my network from the Internet, while commuting or traveling. To secure the open port I installed fail2ban on the machine I redirected the port to. As soon as the jail was enabled in the file /etc/fail2ban/jail.conf and fail2ban was restarted it blocks the source IP for 5 minutes after 6 failed login attempts by default. If I remember right, it took only a few minutes and the first IP was blocked. I decided, that 5 minutes is too short and extended the blocking to one day, as well as the number of allowed login attempts, which I reduced to 3.

Since I use ssh-keys, I also decided to deny root login at all and password logins via pam by modifying /etc/ssh/sshd_config:

PermitRootLogin no
PasswordAuthentication no
UsePAM no

However, this caused some trouble with fail2ban. The log entries of failed login attempts in /var/log/auth.log now looked differently. And soon there was someone (more likely a bot?) trying to login 1960 times within 1:15 hours from the IP address 61.183.1.8. Therefore, I added the following lines to failregex in /etc/fail2ban/filter.d/sshd.conf

^%(__prefix_line)sConnection closed by \[preauth\]\s*$
^%(__prefix_line)sReceived disconnect from : 11: Bye Bye \[preauth\]\s*$

Now fail2ban blocks the login attempts again.

Before I turned off pam authentication, I searched the Internet for how to log passwords and came across this article on a Symantec site using so called honeypots to attract hackers with a weak system. This inspired me to evaluate at least the data I can extract from my auth.log files. I wrote a perl script, which creates a SQLite database of the failed logins and an R script using ggplot2 to display the results, which is even possible without using fail2ban. To find out the subnet and country of the source IP I found a function for R here, which I used for whois lookups. There are other possibilities to visualize the data, too, for example on the fail2ban webpage itself.

There were 4628 failed logins at total from 1005 IP addresses (457 subnets) with 296 usernames. And here are my results:

Timeseries for the one month my port is open now:

ts

The usernames used. Note that since I deny password logins usernames are only logged when they are before connecting (e.g. ssh admin@<host> for command line ssh). Note the x-axis are scaled logarithmic and only a subset of the total usernames, subnets and countries are shown:

user

The subnets, from which most attacks came:

subnet

The countries:

country

And finally the map (with this worldmap) with the number of attacks color coded (again logarithmic):

map

The strange colorbar is somehow related to cairoDevice. However, fonts are much better with it than with the default png device.

Both scripts I wrote are available on GitHub.

Another related link from Cisco: http://www.cisco.com/web/about/security/intelligence/ssh-security.html.

Setting up a Raspberry Pi with Raspbian

Just recently I bought a new toy: Raspberry Pi B+ and a 16GB SD card. The card is quiet large, however, keeping in mind that SD cards have limited write cycles I decided to buy a large one. I plan to set up a web-server on it running nginx under Raspbian in the DMZ (between my own router and the one of my provider) on it. The server should be accessible from the internet for me. Nevertheless, that will become another story. Here I will describe, how I set up the Raspberry Pi. As operating system on my PC I use a mix of Debian testing and unstable. The PC has a (micro-) SD card reader.
Once you have the hardware, you can download the image at http://www.raspberrypi.org/downloads/. Enter the SD card in the reader and find out which disk is your SD card, in my case it is /dev/sdg. As root (or sudo) write the image to the card by executing

> dd bs=4M if=Downloads/2014-09-09-wheezy-raspbian.img of=/dev/sdg

In other blogs I found that you might need to set bs=1M, however, for me the previous worked. Make sure the buffer is written to the card by syncing.

> sync

Increasing the partition size as described elsewhere to the full size of the card is not necessary, since that can be done once you start up Raspbian.
That’s it already, now you can power up the Raspberry Pi. I connected the Raspberry Pi to my router to start it headless (without keyboard, mouse and monitor). If you have local network with DHCP, and the server accepts “send host-name” from the client (in /etc/dhcp/dhclient.conf), you can log in via SSH immediately. Otherwise, you have to find out the IP address, p.e. with nmap, scanning a subnet for the open port 22 (SSH), including some details:

> nmap –sV -p 22 192.168.122.0/24
Starting Nmap 6.47 ( http://nmap.org ) at date time
[...]
Nmap scan report for raspberrypi (192.168.122.173)
Host is up (-0.076s latency).
PORT STATE SERVICE VERSION
22/tcp open ssh OpenSSH 6.0p1 Debian 4+deb7u2 (protocol 2.0)
MAC Address: B8:27:EB:00:00:00 (Raspberry Pi Foundation)
[...]

This gives you the IP address of your Raspberry Pi, and maybe some unexpected results for other devices in your network. Note: the MAC address is only displayed if you are on the same subnet and in my case only if you execute nmap with root privileges. If the MAC address is returned, the Raspberry Pi can be identified by the first 24 bits (Organizationally Unique Identifier: B8:27:EB). Now you can login, but don’t forget to replace the IP address with yours.

> ssh pi@raspberrypi

or

> ssh pi@192.168.122.173

The default username is “pi” password is “raspberry”, which you should as a matter of course change immediately. Also increasing the partition size to fill up the whole disk can be done now. Both is possible with the command:

> sudo raspi-config

I will continue running the Raspberry Pi headless, therefore I uninstalled almost al X stuff (p.e. packages starting with “lx”). And since I plan to install a web server with a SQL database I decided to connect an external hard disk for the /tmp and /var directories. However, I didn’t find a way switching to init 1 without loosing my ssh connection, so that there is no process with open file handles on /var 😦 So I only moved /tmp, /var/www and /var/lib/mysql to the external hard disk.

Installing a git server for our working group

In our working group we’ve been discussing for quiet a while now to install our own version control system. However, it was always a question of man power that nothing happened. At the end of last week I decided to install a git server on an unused computer in my office. I must say, it was much easier than I thought. After a quick web search I choose gitolite with gitweb, which is part of git. The operating system is Ubuntu 14.04, but I assume it is very similar for all Debian like systems (without the usage of sudo all the time, when you open one root shell in Debian). The main part of the information is from two other webpages on ubuntuforums and Ubuntu help. In the following I describe what I did and partly why. One minor problem persisted till the end. You have to modify some configuration files with your text editor of choice. On my local machine I prefer Emacs, nevertheless, my choice over the network is vi since it is much faster.

1.) Install the software

> sudo apt-get install gitolite gitweb

2.) Create user git

> sudo adduser --system --shell /bin/bash --group --disabled-password --home /home/git git

3.) Adjust projectroot and projects_list in /etc/gitweb.conf. I will use the user git to run gitolite, therefore I modified the corresponding lines in the config file:

$projectroot = "/home/git/repositories/";
$projects_list = "/home/git/projects.list";

4.) Configure the webserver, in my case I use Apache

> sudo cp -i /etc/apache2/conf.d/gitweb /etc/apache2/conf-available/gitweb.conf
> cd /etc/apache2/conf-enabled
> sudo ln -s ../conf-available/gitweb.conf

5.) Modify the Apache configuration: add “+” before “FollowSymLinks”

> sudo vi gitweb.conf

So that the config file looks like follows:

Alias /gitweb /usr/share/gitweb
  Options +FollowSymLinks +ExecCGI
  AddHandler cgi-script .cgi

6.) Add the user www-data to the group git, so that apache can read the repositories (www-data, or whatever UID apache is running under). You can do this either by modifying /etc/group by hand or by command:

> sudo vi /etc/group

or

> sudo usermod -a -G git www-data

7.) Enable the Apache CGI module and restart the server.

> sudo a2enmod cgi
> sudo service apache2 restart

8.) Gitolite uses a git repository for its one configuration, so that I don’t need to do everything under the UID git. For authentication ssh-keys are used. You have to copy the public ssh key (in Linux/Unix systems: ~/.ssh/id_rsa.pub) to a directory readable by the user git. The file name is the username for gitolite. For example you can use the user you are logged in as. If you have no ssh key pair yet create one first and follow the instructions:

> ssh-keygen
> cp ~/.ssh/id_rsa.pub /tmp/admin.pub

9.) Change to user git and initialize Gitolite

> sudo su - git
> gl-setup /tmp/admin.pub

10.) In the configuration file (~/.gitolite.rc) change $REPO_UMASK from 0077 to 0027, otherwise the webserver won’t be able to read the repositories and project lists. And then exit user git

> vi ~/.gitolite.rc
> exit

11.) Set up git global vars, if not previously done. And clone the configuration repository.

> git config --global user.email "your.email@domain.org"
> git config --global user.name "admin"
> git clone git@my.gitserver.org:gitolite-admin.git

12.) Add users by adding their public ssh key to the subfolder “keydir” in the just created repository “gitolite-admin”. Don’t forget to add the file to the repository!

cd gitolite-admin
git add keydir/newuser.pup

13.) Create new repositories in gitolite-admin/conf/gitolite.conf and commit changes and push them back to the server.

git commit -a -m "created new repository and added user newuser"
git push -u origin master

14.) Now comes the tricky point, which took a while to find out. And as always afterwards I thought “Was that easy!” The repositories were not listed on the webpage of gitweb (http://my.gitserver.org/gitweb/). If you want the repository to be displayed on gitweb the user “gitweb” needs to be added to the repositories access list in ~/gitolite-admin/conf/gitolite.conf.

All user for which you added a ssh key to the gitolite-admin repository cat now access those repositories for which you granted read and/or write access in the configuration file. All users use the UID git, p.e.

git clone git@my.gitserver.org:newrep.git

And now I and my collegues can access and share the repositories we work together on or we could also use it for manuscripts. If you want to follow the above description don’t forget to replace the usernames with your appropriate usernames and “my.gitserver.org” with your server name or IP address.