0

I have data on a metropolitan area and want to extract out the city info.

An example is

test <- c("Akron, OH METRO AREA","Auburn, NY Micro Area","Boston-Cambridge, MA-NH")

And I want it to look like

"Akron, OH", "Auburn, NY", "Boston-Cambridge, MA"

So just the City, State

MrFlick
  • 163,738
  • 12
  • 226
  • 242
user3304359
  • 195
  • 8

2 Answers2

4

An option is sub from base R by matching one ore more space (\\s+) followed by the , followed dby the upper case letters ([A-Z]+), capture as a group ((...)), in the replacement, specify the backreference (\\1) of the captured group

sub("(,\\s+[A-Z]+).*", "\\1", test)
#[1] "Akron, OH"            "Auburn, NY"           "Boston-Cambridge, MA"
akrun
  • 674,427
  • 24
  • 381
  • 486
2

An easy option is a stringr::str_extract

test <- c("Akron, OH METRO AREA","Auburn, NY Micro Area","Boston-Cambridge, MA-NH")
stringr::str_extract(test, "[^,]+, .{0,2}")
# [1] "Akron, OH"            "Auburn, NY"           "Boston-Cambridge, MA"

We match anything that's not a comma, then a comma-space-then up to two more character.

MrFlick
  • 163,738
  • 12
  • 226
  • 242
  • Thanks! I always forget stringr cause I don't have much experience with regex. Makes sense! – user3304359 Aug 27 '19 at 20:09
  • Another one for you? If I have "Virginia Beach-Norfolk-Newport News, VA" How can I make it into 3 rows Virginia Beach, VA Norfolk, VA Newport News, VA – user3304359 Aug 27 '19 at 20:11
  • @user3304359 That's a different issue than what you've described here. Maybe open up a different question. – MrFlick Aug 27 '19 at 20:14