21

Is there a fast way to convert latitude and longitude coordinates to State codes in R? I've been using the zipcode package as a look up table but it's too slow when I'm querying lots of lat/long values

If not in R is there any way to do this using google geocoder or any other type of fast querying service?

Thanks!

Michael
  • 7,065
  • 18
  • 45
  • 74
  • 2
    see also my answer here, using `ggmap::revgeocode` : https://stackoverflow.com/questions/46150851/how-to-get-california-county-location-from-latitude-and-longitude-information/46151310#46151310 – Moody_Mudskipper Sep 11 '17 at 08:51

5 Answers5

44

Here are two options, one using sf and one using sp package functions. sf is the more modern (and, here in 2020, recommended) package for analyzing spatial data, but in case it's still useful, I am leaving my original 2012 answer showing how to do this with sp-related functions.


Method 1 (using sf):

library(sf)
library(spData)

## pointsDF: A data.frame whose first column contains longitudes and
##           whose second column contains latitudes.
##
## states:   An sf MULTIPOLYGON object with 50 states plus DC.
##
## name_col: Name of a column in `states` that supplies the states'
##           names.
lonlat_to_state <- function(pointsDF,
                            states = spData::us_states,
                            name_col = "NAME") {
    ## Convert points data.frame to an sf POINTS object
    pts <- st_as_sf(pointsDF, coords = 1:2, crs = 4326)

    ## Transform spatial data to some planar coordinate system
    ## (e.g. Web Mercator) as required for geometric operations
    states <- st_transform(states, crs = 3857)
    pts <- st_transform(pts, crs = 3857)

    ## Find names of state (if any) intersected by each point
    state_names <- states[[name_col]]
    ii <- as.integer(st_intersects(pts, states))
    state_names[ii]
}

## Test the function with points in Wisconsin, Oregon, and France
testPoints <- data.frame(x = c(-90, -120, 0), y = c(44, 44, 44))
lonlat_to_state(testPoints)
## [1] "Wisconsin" "Oregon"    NA

If you need higher resolution state boundaries, read in your own vector data as an sf object using sf::st_read() or by some other means. One nice option is to install the rnaturalearth package and use it to load a state vector layer from rnaturalearthhires. Then use the lonlat_to_state() function we just defined as shown here:

library(rnaturalearth)
us_states_ne <- ne_states(country = "United States of America",
                          returnclass = "sf")
lonlat_to_state(testPoints, states = us_states_ne, name_col = "name")
## [1] "Wisconsin" "Oregon"    NA         

For very accurate results, you can download a geopackage containing GADM-maintained administrative borders for the United States from this page. Then, load the state boundary data and use them like this:

USA_gadm <- st_read(dsn = "gadm36_USA.gpkg", layer = "gadm36_USA_1")
lonlat_to_state(testPoints, states = USA_gadm, name_col = "NAME_1")
## [1] "Wisconsin" "Oregon"    NA        

Method 2 (using sp):

Here is a function that takes a data.frame of lat-longs within the lower 48 states, and for each point, returns the state in which it is located.

Most of the function simply prepares the SpatialPoints and SpatialPolygons objects needed by the over() function in the sp package, which does the real heavy lifting of calculating the 'intersection' of points and polygons:

library(sp)
library(maps)
library(maptools)

# The single argument to this function, pointsDF, is a data.frame in which:
#   - column 1 contains the longitude in degrees (negative in the US)
#   - column 2 contains the latitude in degrees

lonlat_to_state_sp <- function(pointsDF) {
    # Prepare SpatialPolygons object with one SpatialPolygon
    # per state (plus DC, minus HI & AK)
    states <- map('state', fill=TRUE, col="transparent", plot=FALSE)
    IDs <- sapply(strsplit(states$names, ":"), function(x) x[1])
    states_sp <- map2SpatialPolygons(states, IDs=IDs,
                     proj4string=CRS("+proj=longlat +datum=WGS84"))

    # Convert pointsDF to a SpatialPoints object 
    pointsSP <- SpatialPoints(pointsDF, 
                    proj4string=CRS("+proj=longlat +datum=WGS84"))

    # Use 'over' to get _indices_ of the Polygons object containing each point 
        indices <- over(pointsSP, states_sp)

    # Return the state names of the Polygons object containing each point
    stateNames <- sapply(states_sp@polygons, function(x) x@ID)
    stateNames[indices]
}

# Test the function using points in Wisconsin and Oregon.
testPoints <- data.frame(x = c(-90, -120), y = c(44, 44))

lonlat_to_state_sp(testPoints)
[1] "wisconsin" "oregon" # IT WORKS
Josh O'Brien
  • 148,908
  • 25
  • 332
  • 435
  • 2
    I had to change wgs84 to WGS84 to get this example to work. – lever Mar 08 '16 at 23:11
  • @lever Thanks for pointing that out. Wonder when (and where) that was changed. In any case, I've now edited to fix it. – Josh O'Brien Mar 08 '16 at 23:17
  • Is there a quick edit to get this to go from lat/lon to Zip Codes instead of states? – Agustín Indaco Sep 18 '18 at 20:37
  • 1
    @AgustínIndaco Not quick, since in my code, the polygon layer of states is provided by the **maps** package, and it has no corresponding layer of zip code boundaries. If you find such a layer, you could of course adapt my code to work with it. Alternatively, you may want to look into "reverse geocoding" as, e.g., [here](https://stackoverflow.com/a/37117595/980833). – Josh O'Brien Sep 18 '18 at 21:06
  • 1
    I've found this answer produces results that lack adequate precision for some applications. For example, `38.83226,-76.98946` is coded as Maryland, not the District of Columbia. And `34.97982,-85.42203` is coded as Tennessee, not Georgia. If you're working with 15,000 points, as I am, this method is going to produce a lot of incorrect results (about 900 in the data set I'm working with, I'd estimate). I'm not sure what a better solution would be, however. – LaissezPasser Jul 31 '19 at 18:49
  • 1
    This also works well for county by changing "state" to "county". – Neal Barsch Apr 28 '20 at 15:03
  • @LaissezPasser Thanks for mentioning that. For much more accurate results, you can use the code I just now posted under **Method 1** and the [GADM](https://en.wikipedia.org/wiki/GADM)-maintained dataset mentioned near the bottom of that section. – Josh O'Brien Jun 18 '20 at 00:54
12

You can do it in a few lines of R.

library(sp)
library(rgdal)
#lat and long
Lat <- 57.25
Lon <- -9.41
#make a data frame
coords <- as.data.frame(cbind(Lon,Lat))
#and into Spatial
points <- SpatialPoints(coords)
#SpatialPolygonDataFrame - I'm using a shapefile of UK counties
counties <- readOGR(".", "uk_counties")
#assume same proj as shapefile!
proj4string(points) <- proj4string(counties)
#get county polygon point is in
result <- as.character(over(points, counties)$County_Name)
tingenek
  • 131
  • 1
  • 4
3

See ?over in the sp package.

You'll need to have the state boundaries as a SpatialPolygonsDataFrame.

mdsumner
  • 26,859
  • 5
  • 76
  • 87
0

Example data (polygons and points)

library(raster)
pols <- shapefile(system.file("external/lux.shp", package="raster"))
xy <- coordinates(p)

Use raster::extract

extract(p, xy)

#   point.ID poly.ID ID_1       NAME_1 ID_2           NAME_2 AREA
#1         1       1    1     Diekirch    1         Clervaux  312
#2         2       2    1     Diekirch    2         Diekirch  218
#3         3       3    1     Diekirch    3          Redange  259
#4         4       4    1     Diekirch    4          Vianden   76
#5         5       5    1     Diekirch    5            Wiltz  263
#6         6       6    2 Grevenmacher    6       Echternach  188
#7         7       7    2 Grevenmacher    7           Remich  129
#8         8       8    2 Grevenmacher   12     Grevenmacher  210
#9         9       9    3   Luxembourg    8         Capellen  185
#10       10      10    3   Luxembourg    9 Esch-sur-Alzette  251
#11       11      11    3   Luxembourg   10       Luxembourg  237
#12       12      12    3   Luxembourg   11           Mersch  233
Robert Hijmans
  • 24,233
  • 3
  • 36
  • 43
0

It's very straightforward using sf:

library(maps)
library(sf)

## Get the states map, turn into sf object
US <- st_as_sf(map("state", plot = FALSE, fill = TRUE))

## Test the function using points in Wisconsin and Oregon
testPoints <- data.frame(x = c(-90, -120), y = c(44, 44))

# Make it a spatial dataframe, using the same coordinate system as the US spatial dataframe
testPoints <- st_as_sf(testPoints, coords = c("x", "y"), crs = st_crs(US))

#.. and perform a spatial join!
st_join(testPoints, US)


         ID        geometry
1 wisconsin  POINT (-90 44)
2    oregon POINT (-120 44)
HAVB
  • 1,640
  • 1
  • 19
  • 27