24

I have survey data of species richness that was taken at various sites in the Chesapeake Bay, USA, and I would like to graphically present the data as a "heat map."

I have a dataframe of lat/long coordinates and richness values, which I converted into a SpatialPointsDataFrame and used the autoKrige() function from the automap package to generate the interpolated values.

First, can anyone comment as to whether I am correctly implementing the autoKrige() function?

Second, I am having trouble plotting the data and overlaying a map of the region. Alternately, could I specify the interpolation grid to reflect the borders of the Bay (as suggested here)? Any thoughts on how I might do that and where I might get that information? Supplying the grid to autoKrige() appears easy enough.


EDIT: Thanks to Paul for his super helpful post! Here is what I have now. Having trouble getting ggplot to accept both the interpolated data and the map projection:

require(rgdal)
require(automap)
#Generate lat/long coordinates and richness data
set.seed(6)
df=data.frame(
  lat=sample(seq(36.9,39.3,by=0.01),100,rep=T),
  long=sample(seq(-76.5,-76,by=0.01),100,rep=T),
  fd=runif(10,0,10))
initial.df=df

#Convert dataframe into SpatialPointsDataFrame
coordinates(df)=~long+lat

#Project latlong coordinates onto an ellipse
proj4string(df)="+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs"
#+proj = the type of projection (lat/long)
#+ellps and +datum = the irregularity in the ellipse represented by planet earth

#Transform the projection into Euclidean distances
project_df=spTransform(df, CRS("+proj=merc +zone=18s +ellps=WGS84 +datum=WGS84")) #projInfo(type="proj")

#Perform the interpolation using kriging
kr=autoKrige(fd~1,project_df)
#Extract the output and convert to dataframe for easy plotting with ggplot2
kr.output=as.data.frame(kr$krige_output)
#Plot the output
#Load the map data for the Chesapeake Bay
cb=data.frame(map("state",xlim=range(initial.df$long),ylim=range(initial.df$lat),plot=F)[c("x","y")])

ggplot()+
  geom_tile(data=kr.output,aes(x=x1,y=x2,fill=var1.pred))+  
  geom_path(data=cb,aes(x=x,y=y))+
  coord_map(projection="mercator")
Community
  • 1
  • 1
jslefche
  • 3,839
  • 7
  • 34
  • 47
  • 1
    the example code gives me an empty plot with ggplot_0.9.3 in R 2.15 and an error from the plot in R 3.0.0 (no applicable method for depth) – David LeBauer Apr 23 '13 at 15:30

1 Answers1

47

I have a number of remarks on your post:

Using kriging

I see that you are using geostatistics to construct your heatmap. You could also consider other interpolation techniques such as splines (e.g. Thin plate splines in the fields package). These make less assumptions about the data (e.g. stationarity), and can also visualize your data just fine. The reduction in the number of assumptions might help in case you send it to a journal, then you have less to explain to the reviewers. You can also compare a few interpolation techniques if you want, see a report I wrote for some tips.

Data projection

I see that you are using lat long coordinates for kriging. Edzer Pebesma (author of gstat) remarked that there are no variogram models that are suitable for lat lon coordinates. This is because in lat lon the distances are not straight (i.e. Euclidean), but over a sphere, (i.e. Great circle distances). There are no covariance functions (or variogram models) that are valid for spherical coordinates. I recommend projecting them using spTransform from the rgdal package before using automap.

The rgdal package uses the proj4 projection library to perform the calculations. To project your data you first need to define its projection:

proj4string(df) = "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs"

The proj4 string on the right hand side of the expression above defines the type of projection (+proj), the ellips that was used (+ellps) and the datum (+datum). To understand what these terms mean, you have to imagine the Earth as a potato. The Earth is not perfectly spherical, this is defined by the ellips. Neither is the Earth a perfect ellipsoid, but the surface is more irregular. This irregularity is defined by the datum. See also this article on Wikipedia.

Once you have the projection defined, you can use spTransform:

project_df = spTransform(df, CRS("+proj= etcetc"))

where CRS("+proj etc") defines the target projection. Which projection is appropriate depends on your geographical location and the size of your study area.

Plotting with ggplot2

For adding polygons or polylines to ggplot, please to a look the documentation of coord_map. This includes an example of using the maps package to plot country boundaries. If you need to load for example shapefiles for your study area, you can do so using rgdal. Do remember that ggplot2 works with data.frame's, not SpatialPolygons. You can transform SpatialPolygons to data.frame using:

poly_df = fortify(poly_Spatial)

See also this function I created to plot spatial grids. It works directly on SpatialGrids/Pixels. Note that you need to source one or two additional files from that repository (continuousToDiscrete).

Creating interpolation grid

I created automap to generate an output grid when none was specified. This is done by creating a convex hull around the data points, and sampling 5000 points inside it. The boundaries of the prediction area, and the number of points sampled in it (and thus the resolution) is quite arbitrary. For a specific application the shape of the prediction area can be derived from a polygon, using spsample to sample points inside the polygon. How many points to sample, and thus the resolution, depends on two things:

  • the kind of data you have, For example, if your data is very smooth, there is not much point in raising the resolution really high in comparison to this smoothness. Alternatively, if your data has many small scale strcutures, you need a high resolution. This is only possible ofcourse if you have the observations to support this high resolution.
  • the density of data. If your data is more dense, you can raise the resolution.

If you use your interpolated map for subsequent analyses, getting the resolution right is important. If you use the map purely for visuatlisation purposes, this is less important. Note however that in both cases a too high resolution can be misleading as to the accuracy of your predictions, and that a too low resolution does not do justice to the data.

Paul Hiemstra
  • 56,833
  • 11
  • 132
  • 142
  • 1
    We're in Paul's neck of the woods, now! +1 for a well written (and sourced) answer. – Roman Luštrik Apr 08 '12 at 11:36
  • Extraordinarily well-written and informative answer! Thank you very much for providing such detail for someone (me) who is (obviously) very new to this. I have one question: where can I find arguments that would be suitable for `CRS()` in `spTransform` for the scale that I am working with? (e.g., the Chesapeake Bay: 300 x 50 km in the Northwestern Mid-Atlantic region) – jslefche Apr 08 '12 at 16:43
  • You can akways use UTM, which covers the whole globe. It does this by projecting in zones across the globe. Once you know the zone number, you are good to go. For the proj4string, just google `utm proj4`.Alternatively, you can use a local coordinate system. Googling for you country or area plus `coordinate system` should. Or look what existing maps of the area use. – Paul Hiemstra Apr 08 '12 at 17:21
  • Great! Let's suppose I use +proj=UTM +zone=18s. I'm now having trouble getting ggplot to accept both the interpolated data and the map data. See my edited post above – jslefche Apr 08 '12 at 17:51
  • Ggplot2 uses a different projection library than rgdal as far as I know. So mercator is not equal to UTM, which is a special mercator. The idea is that once all your data (polygon an grid) are projected into the same projection using proj4 you do not need to provide additional info to ggplot. – Paul Hiemstra Apr 08 '12 at 18:01
  • Your best bet is to create the grid in the correct projection and change the polygon to that projection. Reprojecting a grid leads to resampling, which rgdal does not do. – Paul Hiemstra Apr 08 '12 at 18:03
  • 1
    Roger that. Good to know that PROJ.4 projections are not the same as the ones in `coord_map`. I will convert to a data.frame using `rgdal` and update the post (with code and the resulting plot) for others to reference down the line. Thanks again, Paul! – jslefche Apr 08 '12 at 18:39