0

I have a function that behaves incorrectly when passed through the mutate function from the dplyr package. The function takes a UK postcode and returns a postal area. It works fine with individual post codes or vectors of postcodes.

Here is the function:

pArea_parse <- function(x) {
z <- any(grep('[A-Z][A-Z]',substr(x,1,2)))
y <- any(grep('[A-Z][0-9]',substr(x,1,2)))

if (z) {
    return(substr(x,1,2))
    }
else if (y) {
        return(substr(x,1,1))
        }
else if (!y & !z) {
    return(NA)
        }
}

It works:

x <- "B30 1AA" # plucked randomly from a postcode site
> pArea_parse(x)
[1] "B"

Here is some sample data:

test <- data.frame(id = c(1,2,3,4), post_code = c("B30 1AA", "B30 3FT", "B30 
3AZ", "BA1 8TU"))

Here is my dplyr code:

test %>% mutate(postal_area = pArea_parse(post_code))

Instead of returning the first letter when there is a letter followed by a number, it returns the letter and the number, even though this doesn't happen with a vector of postcodes or an individual postcode.

id post_code postal_area
1   B30 1AA          B3
2   B30 3FT          B3
3   B30 3AZ          B3
4   BA1 8TU          BA

How can a function do something it's not programmed to do when used in conjunction with mutate? I am stumped!

CClarke
  • 161
  • 1
  • 10
  • 4
    I think the issue is that your function does not work correctly with vectors. – Kerry Jackson Aug 16 '18 at 16:37
  • I think you probably wanted to structure this around ifelse, or even better case_when rather than a traditional if/else clause. The former are vectorized. – joran Aug 16 '18 at 16:39
  • How does one properly vectorize a function? Why would not vectorizing my function produce the observed behvaiour? Thanks. – CClarke Aug 16 '18 at 16:41
  • 1
    If you use `purrr::map` with your function and `tidyr::unnest`, you could avoid vectorizing. `test %>% mutate(postal_area = map(post_code, pArea_parse)) %>% unnest()` – AndS. Aug 16 '18 at 16:45

1 Answers1

2

Your use of any() and if/else makes your function non-vectorized. That is, if you pass in a vector of values, you do not get the right vector of values out. This is not specific to mutate(). If you try your function outside of mutate,, you'll get the same result

pArea_parse(c("B30 1AA", "B30 3FT", "B30 3AZ", "BA1 8TU"))
# [1] "B3" "B3" "B3" "BA"

You can make this easier using the dplyr helper function case_when. For example

pArea_parse <- function(x) {
  z <- grepl('[A-Z][A-Z]',substr(x,1,2))
  y <- grepl('[A-Z][0-9]',substr(x,1,2))

  case_when(z~substr(x,1,2),
            y~substr(x,1,1),
            TRUE~NA_character_)
}
MrFlick
  • 163,738
  • 12
  • 226
  • 242
  • Thank you, that worked. Do you know where I could learn about vectorizing functions? I don't really understand why my function didn't work and yours did... – CClarke Aug 16 '18 at 16:45
  • 1
    This is the first thing that came up when i googled it: http://alyssafrazee.com/2014/01/29/vectorization.html. There are plenty of others out there as well. Try searching "r vectorize functions" – MrFlick Aug 16 '18 at 16:47
  • 1
    Similar answer, but in base R: `pArea_parse – Kerry Jackson Aug 16 '18 at 16:48