Downloading MODIS data with WebService::MODIS

For teaching at the University of Frankfurt about “Remote sensing in global ecology” with R together with a collegue we need a lot of different MODIS products for several regions in the world. Last year we used ModisDownload.R. It is a great tool, since it can download the data and if Modis Reprojection Tool (MRT) is installed, it merges several tiles together and transforms them to other geograpgical projections. However, it was a bit frustrating since it stopped downloading quiet often and does not support continued downloads. So I decided to write my own routine for downloading the MODIS products. For unexperienced users, like our students last year, it is quiet difficult to install and run MRT under windows, therefore we will supply them with ready made GeoTiffs this year. For this preprocessing I still use ModisDownload.R as an interface to MRT.

After downloading and installing the module from CPAN and installing it and its prerequisites with the normal

perl Makefile.PL
make
make test
make install

you can choose, which MODIS product for which regions you need and start writing the perl program for downloading the data. In the Perl program you then need to load the module with

use WebService::MODIS;

What can my module do?

The module first needs to retrieve the directory structure of the sever, where the MODIS data resides. this is done with the function

initCache();

This takes a few seconds to minutes. If you want, you can write the cached server structure to disk, for future usage, since the data is lost once you close the current perl process. The following function writes the cache to disk:

writeCache();

And this function is already one of the bugs I am aware of, since so far it only works under Linux. I did not know which environmental variable I could use for userspace configuration files under MacOS or even Windows. I happy if someone could help me here.

Once you start a new perl process, you can load the data from the saved configuration files with

readCache();

Optionally you can supply the writeCache and readCache functions with your own path, if you’re not happy with the default one.

If you know, which MODIS porducts, which tiles and for which time period you want the files, you can now either create a download list for usage in other download managers or use the build in download function. Two options exist, to create a new object:


my $lct = WebService::MODIS->new(product => "MCD12Q1",
dates => ['2001-01-01', '2010-01-01'], ifExactDates => 1,
h => [11,12], v=> [9,10]);

or

my $lct = WebService::MODIS->new();
$lct->product("MCD12Q1");
$lct->h([11,12]);
$lct->v([9,10]);
$lct->dates(["2001-01-01", "2010-01--01"]);
$lct->ifExactDates(1);

creates an object of the MODIS landcover product MCD12Q1 covering Rondonia, Brazil for the two years 2001 and 2010. This does not yet contain the details about the URLs. For that the Server needs to be contacted again with

$lct->createUrl;

Now the link list is for the $lct object is available and can be used either to print the list of URLs

print "$_\n" foreach $lct->url;

or to download the data (>700MB!):

$lct->download();

Where the download function can take the destination path as an optional argument.

About my first experience with “The Comprehensive Perl Archive Network” (CPAN)

After reading two other webpages I got a rough picture of what I have to do to publish a module on CPAN. I used

h2xs -AXc -n ModisDownload

for creating a skeleton of the module, then I filled it with my code. As you can see, I initially called my module “ModisDownload”, like the R library I got inspired by. What is really motivating to write a documentation is the default one, which would not have been so nice for me, if I left it unedited.

After that I applied for a Perl Authors Upload Server (PAUSE) account, which I got really fast with the hint to first publish my module on PrePAN, a prerelease server, where experienced CPAN users should help improve other modules. I got one comment, to rename my module and that the API is not very “perlish”. First comment ok, was applied quiet fast. The second part is not very constructive. However, I tried my best, to make my API more perlish by giving it an object oriented design. Constructive comments for further improvements are welcome. Especially the cache stuff still has the old unperlish(?) API.

Now I finally upladed the module to CPAN. The module is first sent out to “CPAN Testers”, a group of volunteers for testing new modules. The next day I got the report from these testers and what the f…, all tests had status “FAIL”. I now ran “make test” the first time myself I must shameful say and it failed. It still tried to load the module “ModisDownload”, but since I renamed the module to “WebService::MODIS” it had to fail. I fixed it and extented the test a little and tested the test myself, before I submitted a new version of my module to CPAN.

Now I hope there a not that many bugs in the code anymore and I might extend the funtionality somewhat, that GDAL can be used for transformation or selecting the horizontal and vertical sinusoidal numbers by latitude and longitude and if I get other suggestion I can see what is possible.

Two hours later: In the meantime it became already version number 1.2. When I look at the public CPAN page, I realized, the dependencies to other modules where missing. Therefore I added these missing dependencies to the Makefile.PL, which must be made by hand.

Advertisements
Global top soil sand content

Global soil data for vegetation modelling?

Quiet some time ago I uploaded some scripts to github, which import global soil data to Grass-GIS. In the following I will demonstrate the usage of the script with the Harmonized World Soil Database (HWSD) as example. At the end I will briefly discuss some of the data.

The soil data is stored in a MS Access database linked to a raster file. These datasets are freely available in the internet: DSMW, HWSD available through the geonetwork of the FAO and ISRIC-WISE on their webpage.

Grass-GIS is a very powerful tool. I use it mainly to convert geographical data, for calculations and for maps in scientific journals, p.e. the nitrogen and carbon fixation maps in Elbert et al. (2012) were processed and created in Grass-GIS.

My script will store the data in a SQLite data file linked to a Grass-GIS vector map. The individual layers can finally be exported to various other formats for example as input to dynamic global vegetation models.

To use the script you need Grass-GIS with a lat-lon location and 30 arc seconds resolution. Here is the output of the Grass-GIS command “g.region -p”, once the locations is created:

projection: 3 (Latitude-Longitude)
zone: 0
datum: wgs84
ellipsoid: wgs84
north: 90N
south: 90S
west: 180W
east: 180E
nsres: 0:00:30
ewres: 0:00:30
rows: 21600
cols: 43200
cells: 933120000

Another prerequisite is the linux package “mdb-tools” with the commands “mdb-export” and “mdb-schema” to handle the Access database and the linux package “sqlite3”. The following command are executed in the above Grass-GIS environment. First some environmental variables are set. The variable HWSDDIR must point to the directory, where the HWSD data was downloaded to. A temporary directory defined by the variable TMPDIR needs to be present for temporary files.

rastBasemap="HWSD_basemap"
vectBasemap="HWSD_basemap"
GISDB=`g.gisenv GISDBASE`/`g.gisenv LOCATION_NAME`/`g.gisenv MAPSET`/db.sqlite
HWSDDIR=/data/external/global/Soil/HWSD
TMPDIR=`g.gisenv GISDBASE`/`g.gisenv LOCATION_NAME`/tmp.$$

if [ -d $TMPDIR ]; then
  echo "Temporary \"$TMPDIR\"folder exists!"
  echo "Delete it first"
  exit 1
else
  mkdir $TMP_DIR
fi

db.connect driver=sqlite database=${GISDB}

Then the underlying raster map is imported and converted to a vector map and connected to the database. The “v.clean” command cleans up the topology and speeds up further processing substantially.

r.in.gdal -o --overwrite input=${HWSDDIR}/hwsd.bil output=${rastBasemap}
r.to.vect --overwrite input=${rastBasemap} output=${vectBasemap}_tmp feature=area
sqlite3 ${GISDB} "ALTER TABLE ${vectBasemap}_tmp ADD area REAL"
v.to.db option=area type=centroid units=k map=${vectBasemap}_tmp column=area

v.clean --overwrite input=${vectBasemap}_tmp output=${vectBasemap} tool=bpol,rmarea thresh=0.4,0.4
g.remove vect=${vectBasemap}_tmp

The following commands set up the tables in the database of all tables in the original soil data.

mdb-schema ${HWSDDIR}/HWSD.mdb mysql > ${TMPDIR}/HWSD_CREATE.sql
sqlite3 ${GISDB} < ${TMPDIR}/HWSD_CREATE.sql

The following loop imports all data into the just created tables

for table in `mdb-tables ${HWSDDIR}/HWSD.mdb`; do
  mdb-export -d '|' ${HWSDDIR}/HWSD.mdb $table \
    | sed -e '1d' \
    | sed -e 's/\"//g' > ${TMPDIR}/${table}.csv
  cat << EOF | sqlite3 ${GISDB}
.separator "|"
.import ${TMPDIR}/${table}.csv $table
.quit
EOF
  rm -f ${TMPDIR}/${table}.csv
done

The following loop joins the data of the soil mapping unit with the vector database:

newColNames=(`sqlite3 ${GISDB} '.schema HWSD_SMU' | sed -e '1,2d' | sed -e '$d' | sed -e 's/^\s*//' | sed -e 's/\s.*//' | sed -e 's/\`//g'`)
newColTypes=(`sqlite3 ${GISDB} '.schema HWSD_SMU' | sed -e '1,2d' | sed -e '$d' | sed -e 's/,\s//' | sed -e 's/.*\s\s//' | sed -e 's/\s//g'`)
NnewCols=${#newColNames[*]}

cat /dev/null > ${TMPDIR}/join.sql
for ((i=0; i > ${TMPDIR}/join.sql
  echo "ALTER TABLE ${vectBasemap} ADD ${newColNames[$i]} ${newColTypes[$i]};" >> ${TMPDIR}/join.sql
  echo "UPDATE ${vectBasemap} SET ${newColNames[$i]}=(SELECT ${newColNames[$i]} FROM HWSD_SMU WHERE HWSD_SMU.MU_GLOBAL=${vectBasemap}.value);" >> ${TMPDIR}/join.sql
done
sqlite3 ${GISDB} < ${TMPDIR}/join.sql

Now the general data is in the Grass-GIS database and can be exported or displayed. For that I add a temporary column to the database of type integer. The map can either be used as raster or vector map. You can define the variable either as a single value as done here or you create a loop over some values to convert several soil type at once.

soil="AR"
echo "ALTER TABLE ${vectBasemap} ADD isubset INTEGER;" | sqlite3 ${GISDB}
echo "UPDATE ${vectBasemap} SET isubset=0;" > new.sql
echo "UPDATE ${vectBasemap} SET isubset=1 WHERE SU_SYMBOL='${soil}';" >> new.sql
echo "UPDATE ${vectBasemap} SET isubset=0 WHERE isubset IS NULL;" >> new.sql
sqlite3 ${GISDB} < new.sql
v.to.rast --quiet --overwrite input=${vectBasemap} output=SOIL_${soil} use=attr column=isubset
r.mapcalc "SOIL_tmp=if(SOIL_${soil}==1, 1, null())"
r.to.vect --quiet --overwrite input=SOIL_tmp output=SOIL_${soil} feature=area

There is an additional information in the table HWSD_DATA, which is the raw material the SU_SYMBOL was made of. That is the data derived from different other sources like the Digital Soil Map of the World (DSMW), Chinese, European soil data and SOTER. However, usage of these soil parameters, including physical and chemical properties need to be taken with caution. Especially across the border of the metioned data sources as I will demonstate.

Allthough the HWSD allready has a very high resolution of 30 arc seconds, which is slightly less than a kilometer at the equator, it can consist of 10 different soil classes within one gridcell given by the column SEQ (counter) and SHARE (percent coverage) in the table HWSD_DATA. The SHARE goes even down to 1%, meaning a soil definition of less than one hectare on a global scale.

echo "ALTER TABLE ${vectBasemap} ADD rsubset REAL;" | sqlite3 ${GISDB}
echo "UPDATE ${vectBasemap} SET rsubset=0;" > new.sql
echo "UPDATE ${vectBasemap} SET rsubset=(SELECT SHARE FROM HWSD_DATA WHERE HWSD_DATA.MU_GLOBAL=${vectBasemap}.MU_GLOBAL AND (HWSD_DATA.SU_SYM74 LIKE '${soil}%' OR HWSD_DATA.SU_SYM85 LIKE '${soil}%' OR HWSD_DATA.SU_SYM90 LIKE '${soil}%'));" >> new.sql
echo "UPDATE ${vectBasemap} SET rsubset=0 WHERE rsubset IS NULL;" >> new.sql
sqlite3 ${GISDB} < new.sql
v.to.rast --quiet --overwrite input=${vctBasemap} output=SHARE_${soil} type=area use=attr column=rsubset

I calculated the total area of three soil types (Arenosols, Solonetz and Solonchack) and dune sand based on the fractional coverage as well as the dominant soil type:

dominant fractional
Arenosol 957 442
Dune sand 357 280
Solonchack 128 88
Solonetz 204 140

And another drawback comes up, once you look at the distribution of the soils. The color code in the following map is the fractional coverage and the black lines are around regions, where the soil type is defined as dominant. The Arenosol (upper panel) stops at the Egyptian border and is dune sand (lower panel) in Lybia. That’s because at the Arenosols were not defined in the DSMW, which is the source for the soil definition in most of the Sahara region.

AR_DS

And this brings me to my next criticism. If you properly extract the high resolution texture (in the following I will concentrate on sand only). And caculate a wheighted mean of all subsoil types per pixel, you also see the artefact of the different source databases:
T_SAND_mn

Those pixel fractions defined as dune sand in the DSMW source regions didn’t even get any texture information in the HWSD_DATA table. However, the mean sand content of all Arenosols is 90% and if I give the dune sand this texture property and recalculate the sand content my map looks much better:
T_SAND

My overall conclusion hereafter is, using the HWSD as input for global applications is much to complicated and not straight forward. If you want to have a consistent dataset of global soil properties one should stick to the older DSMW or IRIC-WISE data. Nevertheless, I am also looking forward to soilgrids a new dataset of global soil properties.

Visualizing failed SSH login attempts

Shortly after Christmas I opened the ssh port 22 at home to access my network from the Internet, while commuting or traveling. To secure the open port I installed fail2ban on the machine I redirected the port to. As soon as the jail was enabled in the file /etc/fail2ban/jail.conf and fail2ban was restarted it blocks the source IP for 5 minutes after 6 failed login attempts by default. If I remember right, it took only a few minutes and the first IP was blocked. I decided, that 5 minutes is too short and extended the blocking to one day, as well as the number of allowed login attempts, which I reduced to 3.

Since I use ssh-keys, I also decided to deny root login at all and password logins via pam by modifying /etc/ssh/sshd_config:

PermitRootLogin no
PasswordAuthentication no
UsePAM no

However, this caused some trouble with fail2ban. The log entries of failed login attempts in /var/log/auth.log now looked differently. And soon there was someone (more likely a bot?) trying to login 1960 times within 1:15 hours from the IP address 61.183.1.8. Therefore, I added the following lines to failregex in /etc/fail2ban/filter.d/sshd.conf

^%(__prefix_line)sConnection closed by \[preauth\]\s*$
^%(__prefix_line)sReceived disconnect from : 11: Bye Bye \[preauth\]\s*$

Now fail2ban blocks the login attempts again.

Before I turned off pam authentication, I searched the Internet for how to log passwords and came across this article on a Symantec site using so called honeypots to attract hackers with a weak system. This inspired me to evaluate at least the data I can extract from my auth.log files. I wrote a perl script, which creates a SQLite database of the failed logins and an R script using ggplot2 to display the results, which is even possible without using fail2ban. To find out the subnet and country of the source IP I found a function for R here, which I used for whois lookups. There are other possibilities to visualize the data, too, for example on the fail2ban webpage itself.

There were 4628 failed logins at total from 1005 IP addresses (457 subnets) with 296 usernames. And here are my results:

Timeseries for the one month my port is open now:

ts

The usernames used. Note that since I deny password logins usernames are only logged when they are before connecting (e.g. ssh admin@<host> for command line ssh). Note the x-axis are scaled logarithmic and only a subset of the total usernames, subnets and countries are shown:

user

The subnets, from which most attacks came:

subnet

The countries:

country

And finally the map (with this worldmap) with the number of attacks color coded (again logarithmic):

map

The strange colorbar is somehow related to cairoDevice. However, fonts are much better with it than with the default png device.

Both scripts I wrote are available on GitHub.

Another related link from Cisco: http://www.cisco.com/web/about/security/intelligence/ssh-security.html.

Setting up a Raspberry Pi with Raspbian

Just recently I bought a new toy: Raspberry Pi B+ and a 16GB SD card. The card is quiet large, however, keeping in mind that SD cards have limited write cycles I decided to buy a large one. I plan to set up a web-server on it running nginx under Raspbian in the DMZ (between my own router and the one of my provider) on it. The server should be accessible from the internet for me. Nevertheless, that will become another story. Here I will describe, how I set up the Raspberry Pi. As operating system on my PC I use a mix of Debian testing and unstable. The PC has a (micro-) SD card reader.
Once you have the hardware, you can download the image at http://www.raspberrypi.org/downloads/. Enter the SD card in the reader and find out which disk is your SD card, in my case it is /dev/sdg. As root (or sudo) write the image to the card by executing

> dd bs=4M if=Downloads/2014-09-09-wheezy-raspbian.img of=/dev/sdg

In other blogs I found that you might need to set bs=1M, however, for me the previous worked. Make sure the buffer is written to the card by syncing.

> sync

Increasing the partition size as described elsewhere to the full size of the card is not necessary, since that can be done once you start up Raspbian.
That’s it already, now you can power up the Raspberry Pi. I connected the Raspberry Pi to my router to start it headless (without keyboard, mouse and monitor). If you have local network with DHCP, and the server accepts “send host-name” from the client (in /etc/dhcp/dhclient.conf), you can log in via SSH immediately. Otherwise, you have to find out the IP address, p.e. with nmap, scanning a subnet for the open port 22 (SSH), including some details:

> nmap –sV -p 22 192.168.122.0/24
Starting Nmap 6.47 ( http://nmap.org ) at date time
[...]
Nmap scan report for raspberrypi (192.168.122.173)
Host is up (-0.076s latency).
PORT STATE SERVICE VERSION
22/tcp open ssh OpenSSH 6.0p1 Debian 4+deb7u2 (protocol 2.0)
MAC Address: B8:27:EB:00:00:00 (Raspberry Pi Foundation)
[...]

This gives you the IP address of your Raspberry Pi, and maybe some unexpected results for other devices in your network. Note: the MAC address is only displayed if you are on the same subnet and in my case only if you execute nmap with root privileges. If the MAC address is returned, the Raspberry Pi can be identified by the first 24 bits (Organizationally Unique Identifier: B8:27:EB). Now you can login, but don’t forget to replace the IP address with yours.

> ssh pi@raspberrypi

or

> ssh pi@192.168.122.173

The default username is “pi” password is “raspberry”, which you should as a matter of course change immediately. Also increasing the partition size to fill up the whole disk can be done now. Both is possible with the command:

> sudo raspi-config

I will continue running the Raspberry Pi headless, therefore I uninstalled almost al X stuff (p.e. packages starting with “lx”). And since I plan to install a web server with a SQL database I decided to connect an external hard disk for the /tmp and /var directories. However, I didn’t find a way switching to init 1 without loosing my ssh connection, so that there is no process with open file handles on /var 😦 So I only moved /tmp, /var/www and /var/lib/mysql to the external hard disk.

Installing a git server for our working group

In our working group we’ve been discussing for quiet a while now to install our own version control system. However, it was always a question of man power that nothing happened. At the end of last week I decided to install a git server on an unused computer in my office. I must say, it was much easier than I thought. After a quick web search I choose gitolite with gitweb, which is part of git. The operating system is Ubuntu 14.04, but I assume it is very similar for all Debian like systems (without the usage of sudo all the time, when you open one root shell in Debian). The main part of the information is from two other webpages on ubuntuforums and Ubuntu help. In the following I describe what I did and partly why. One minor problem persisted till the end. You have to modify some configuration files with your text editor of choice. On my local machine I prefer Emacs, nevertheless, my choice over the network is vi since it is much faster.

1.) Install the software

> sudo apt-get install gitolite gitweb

2.) Create user git

> sudo adduser --system --shell /bin/bash --group --disabled-password --home /home/git git

3.) Adjust projectroot and projects_list in /etc/gitweb.conf. I will use the user git to run gitolite, therefore I modified the corresponding lines in the config file:

$projectroot = "/home/git/repositories/";
$projects_list = "/home/git/projects.list";

4.) Configure the webserver, in my case I use Apache

> sudo cp -i /etc/apache2/conf.d/gitweb /etc/apache2/conf-available/gitweb.conf
> cd /etc/apache2/conf-enabled
> sudo ln -s ../conf-available/gitweb.conf

5.) Modify the Apache configuration: add “+” before “FollowSymLinks”

> sudo vi gitweb.conf

So that the config file looks like follows:

Alias /gitweb /usr/share/gitweb
  Options +FollowSymLinks +ExecCGI
  AddHandler cgi-script .cgi

6.) Add the user www-data to the group git, so that apache can read the repositories (www-data, or whatever UID apache is running under). You can do this either by modifying /etc/group by hand or by command:

> sudo vi /etc/group

or

> sudo usermod -a -G git www-data

7.) Enable the Apache CGI module and restart the server.

> sudo a2enmod cgi
> sudo service apache2 restart

8.) Gitolite uses a git repository for its one configuration, so that I don’t need to do everything under the UID git. For authentication ssh-keys are used. You have to copy the public ssh key (in Linux/Unix systems: ~/.ssh/id_rsa.pub) to a directory readable by the user git. The file name is the username for gitolite. For example you can use the user you are logged in as. If you have no ssh key pair yet create one first and follow the instructions:

> ssh-keygen
> cp ~/.ssh/id_rsa.pub /tmp/admin.pub

9.) Change to user git and initialize Gitolite

> sudo su - git
> gl-setup /tmp/admin.pub

10.) In the configuration file (~/.gitolite.rc) change $REPO_UMASK from 0077 to 0027, otherwise the webserver won’t be able to read the repositories and project lists. And then exit user git

> vi ~/.gitolite.rc
> exit

11.) Set up git global vars, if not previously done. And clone the configuration repository.

> git config --global user.email "your.email@domain.org"
> git config --global user.name "admin"
> git clone git@my.gitserver.org:gitolite-admin.git

12.) Add users by adding their public ssh key to the subfolder “keydir” in the just created repository “gitolite-admin”. Don’t forget to add the file to the repository!

cd gitolite-admin
git add keydir/newuser.pup

13.) Create new repositories in gitolite-admin/conf/gitolite.conf and commit changes and push them back to the server.

git commit -a -m "created new repository and added user newuser"
git push -u origin master

14.) Now comes the tricky point, which took a while to find out. And as always afterwards I thought “Was that easy!” The repositories were not listed on the webpage of gitweb (http://my.gitserver.org/gitweb/). If you want the repository to be displayed on gitweb the user “gitweb” needs to be added to the repositories access list in ~/gitolite-admin/conf/gitolite.conf.

All user for which you added a ssh key to the gitolite-admin repository cat now access those repositories for which you granted read and/or write access in the configuration file. All users use the UID git, p.e.

git clone git@my.gitserver.org:newrep.git

And now I and my collegues can access and share the repositories we work together on or we could also use it for manuscripts. If you want to follow the above description don’t forget to replace the usernames with your appropriate usernames and “my.gitserver.org” with your server name or IP address.