This is a continuation of my previous post on making maps in R with ggplot2.

The above map looks nice and all, but what if we want to zoom in on a more populated area like Atlanta? We could just use xlim() and ylim() to zoom this map into a specified latitude and longitude, but I'd like to be able to see where exactly in the city I am. This is where the ggmap package comes in handy.

The first step is to use the get_map() function to retrieve a map from the Google Maps API. The zoom parameter specificies how far you want the map to be zoomed in. In general, 3 is a continent, 10 is a city, and 21 is a building. Let's get a quick map of the metro Atlanta area.

map <- get_map("Atlanta", zoom=10)
ggmap(map)

I don't really like this particular look, but get_map() comes with a variety of options for the desired style of map (terrain, satellite, roadmap, hybrid). Let's see what the roadmap style looks like.

map <- get_map("Atlanta", zoom=10, maptype="roadmap")
ggmap(map)

Well that looks familiar. I'll provide further examples later on, but for now let's stick with this style. So, how do we get data on this map? The best feature of ggmap() is that it returns a ggplot object, which means we can apply everything we learned in the previous tutorial (with a few modifications).

I have no idea why, but I haven't been able to get the previous combination of geom_map() and geom_path() to work properly with ggmap(), so it's time to talk about a different method for creating maps. This method involves joining our census data with the shapefile data to create a single dataframe, and then plotting that dataframe using geom_polygon(). All of the other additions are the same as last time (color scale, labels, etc.). I chose a different Color Brewer palette mostly just for fun. Note that the ggmap package comes with a function called theme_nothing() which replaces all the code we had before that stripped the unnecessary elements from the plot.

library(plyr)
colnames(data)[1] <- "id"
d <- join(tract, data, by="id")
ggmap(map) +
    geom_polygon(data=d, aes(x=long, y=lat, group=group, fill=percent), colour=NA, alpha=0.5) +
    scale_fill_gradientn(colours=rev(brewer.pal(9,"RdYlGn")), labels=percent) +
    labs(fill="") +
    theme_nothing(legend=TRUE)

We have colors on our map now, but something is clearly off at the edges of our image. When the boundaries of a census tract intersect with the boundaries of the map, bad things happen. We need to somehow subset our data to only include census tracts within the boundaries of the map, but we also want the census tracts on the edges to still be visible. We can accomplish this with the following code.

box <- as(extent(as.numeric(attr(map, 'bb'))[c(2,4,1,3)] + c(.001,-.001,.001,-.001)), "SpatialPolygons")
tract <- readShapePoly("gz_2010_13_140_00_500k.shp")
tractSub <- gIntersection(tract, box, byid=TRUE, id=as.character(tract$TRACT))
tractSub <- fortify(tractSub, region="TRACT")
d <- join(tractSub, data, by="id")

The first line creates a bounding box that we use with the gIntersection() function in order to properly subset the census tracts. The bounding box is based on the size of the map (with an adjustment of .001), and we calculate the intersection between the box and the census tract shapefile. In the end, we end up with the same dataframe as before except with a modified shapefile.

ggmap(map) +
    geom_polygon(data=d, aes(x=long, y=lat, group=group, fill=percent), colour=NA, alpha=0.5) +
    scale_fill_gradientn(colours=rev(brewer.pal(9,"RdYlGn")), labels=percent) +
    labs(fill="") +
    theme_nothing(legend=TRUE)

We fixed the problem with the borders of our map, but I'm still not completely satisfied. The jagged edges of the census tract boundaries aren't very pleasing to the eye and they make it slightly more difficult to see what's going on. Let's smooth this out a bit.

data <- data[data$id %in% unique(tractSub$id),]
tract <- readShapePoly("gz_2010_13_140_00_500k.shp")
centroids <- data.frame(coordinates(tract), tract$GEO_ID)
colnames(centroids) <- c("long", "lat", "id")
centroids$id <- substring(centroids$id,10,20)
d <- join(data, centroids)
d <- d[complete.cases(d),]

First, we subset the data according to the census tract dataframe that we just created. We're going to be interpolating between every data point, so the less data we have the better. Next, we use the coordinates() function to extract the centroids and join that with our dataframe. d now contains the centroid latitude, centroid longitude, and percent value for each census tract in metro Atlanta.

We will use the interp() function from the akima package to fill in the missing area using the values at the centroids.

library(akima)
akima_li <- interp(x=d$long,
    y=d$lat,
    z=d$percent,
    yo=seq(min(d$lat), max(d$lat), length=250),
    xo=seq(min(d$long), max(d$long), length=250),
    linear=FALSE,
    extrap=TRUE)
dInterp <- data.frame(expand.grid(x=akima_li$x, y=akima_li$y), z=c(akima_li$z))

This isn't a statistics post so I'm going to treat this like magic for purposes of this tutorial. The most important value here is the length parameter which basically controls the resolution of our heatmap. The default value is 40, and I would recommend setting it to around 20 for testing purposes. I set it to 250 when I know everything works and I'm ready for my final output because it can take quite a while to run.

Finally, we'll use the geom_tile() function to map the data we have in dInterp. Nothing too crazy here, check the ggplot docs if you're confused about this part.

ggmap(map) +
    geom_tile(data=dInterp, aes(x, y, fill=z), alpha=0.5, color=NA) +
    scale_fill_gradientn(colours=rev(brewer.pal(9,"RdYlGn")), labels=percent) +
    labs(fill="") +
    theme_nothing(legend=TRUE)

We're done! We now have a nice looking heatmap of the percentage of people in metro Atlanta who do not have health insurance.