Broken axis with ggplot2

For visualizing my data I use R and the library ggplot2. And just lately I made some sensitivity simulations with out dynamic global vegetation model (DGVM) LPJ-GUESS. While summarizing the data per ecosystem and having a first look at the data I realized, that one ecosystem has up to 10 times higher values than all the others. That made me searching for “broken axis” and I didn’t find a satisfying solution, so I had to create my own.

The data used in this example can be downloaded from my Dropbox (I hope I don’t delete it). First the required libraries and data must be loaded and I define a function for the base plot, which I also run immediately.

library(ggplot2)
library(Cairo)
load("data.sum.RData")
base.plot <- function(data) {
p <- ggplot(data, aes(x=value, y=name, col=sens))
p <- p + theme_bw()
p <- p + theme(legend.position="bottom")
p <- p + geom_point(size=2.5, position=position_jitter(w=0, h=0.15), alpha=0.8)
p <- p + scale_color_brewer(palette="Set1", guide=guide_legend(ncol=6, title=NULL))
p <- p + xlab("") + ylab("")
return(p)
}
p <- base.plot(data.sum)
CairoPNG(filename="base_plot.png", width=640, height=320)
print(p)
dev.off()

base_plot

Here you see the large offest between “desert” and the other ecosystems. Therefore I created a “desert mask” column in my data.frame, rescaled the values of the desert, so that they are still larger than the maximum of the others and created custom breaks and labels with the minimum desert value and maximum value of the others. The step between the labels should be 10 here.

data.sum$mask = 0
data.sum$mask[data.sum$name == "desert"] = 1
max.value <- max(data.sum$value)
max.value.other <- max(data.sum$value[data.sum$name != "desert"])
min.value.desert <- min(data.sum$value[data.sum$name == "desert"])
scale <- floor(min.value.desert / max.value.other) - 1
data.sum$value[data.sum$mask == 1] = data.sum$value[data.sum$mask == 1] / scale
step <- 10
low.end <- max(data.sum$value[data.sum$name != "desert"])
up.start <- ceiling(max(data.sum$value[data.sum$name != "desert"]))
breaks <- seq(0, max(data.sum$value), step)
labels <- seq(0, low.end+step, step)
labels <- append(labels, scale * seq(from=ceiling((up.start + step) / step) * step, length.out=length(breaks) - length(labels), by=step))

And now add that new data can be plotted using facet_grid, to show a clear break in the axis:

p <- base.plot(data.sum)
p <- p + facet_grid(. ~ mask, scales="free", space="free")
p <- p + scale_x_continuous(breaks=breaks, labels=labels, expand=c(0.075,0))
p <- p + theme(strip.background = element_blank(), strip.text.x = element_blank())
CairoPNG(filename="broken_axis.png", width=640, height=320)
print(p)
dev.off()

broken_axis

UPDATE: After a comment via twitter, I will also show a plot with a logarithmic x-axis. In my opinion the above “broken axis” stills looks better, although that´s not a clean statistical way.

p <- base.plot(data.sum)
CairoPNG(filename="log10_axis.png", width=640, height=320)
p <- p + scale_x_log10(breaks=c(10, 20, 30, 40, 50, 75, 500, 700))
print(p)
dev.off()

log10_axis

 

Advertisements

2 thoughts on “Broken axis with ggplot2

  1. Alexandre

    Thanks a lot Joergsteinkamp for this interesting tutorial! I just face two problems you could, may be, help me to solve :

    1) What if step value < 1 (i.e. 0.05)
    2) I'd like to get a y-axis break, which parameters will I have to switch, I can't figure out, despite some tests…

    1. Hi Alexandre,

      the second question is relatively simple to answer: You need to invert the mask, so that the higher values are above the lower ones and then you need to exchange the x and y notations:

      ## invert the mask
      data.sum$mask = !data.sum$mask
      ## exchange x and y
      p <- ggplot(data.sum, aes(y=value, x=name, col=sens))
      p <- p + geom_point(size=2.5, position=position_jitter(w=0, h=0.15), alpha=0.8)
      p <- p + facet_grid(mask~., scales="free", space="free")
      p <- p + theme(strip.background = element_blank(), strip.text.y = element_blank())
      p <- p + theme(axis.text.x=element_text(angle = -90, hjust = 0))
      print(p)

      The first question is a bit trickier, due to the 'floor' and 'ceiling' commands to get pretty labels. You can start with the provided data, divide it by 1000 and remove the 'floor' and 'ceiling' commands. However, then you get odd axis labels. You must try to find a way to get them nicer yourself. Here are the three new lines, replace them at their occurrence:

      scale <- min.value.desert / max.value.other
      step <- 0.01
      labels <- append(labels, scale * seq(from=(up.start + step) / step * step, length.out=length(breaks) – length(labels), by=step))

      Then you must find where nice breaks for the labels would be.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s