0

I have two tables of Twitter API data bound together and I want a function that determines if the text contains the word f150. If it does then it should return ford, if not it should search the text for the word Silverado and if Silverado is found it should return chevy. all others should be null.

I saw this online but it isn't working for me. Also are there wildcards in R like in SQL?

`tweet_sentiments <-
 tweet_sentiments %>% 
 mutate(vehicle = if(text = "f150") {ford}
     else_if(text= "silverado"){Chevy})
r2evans
  • 77,184
  • 4
  • 55
  • 96
  • Can you please show a small reproducible example and expected output – akrun May 03 '21 at 17:05
  • (1) `if` in a mutate is *probably* wrong, consider `ifelse`. (2) `text = "f150"` is an assignment not a comparison, perhaps you mean `text == "f150"`. (3) Wildcards? Yes, try `grepl` and look into regex (https://stackoverflow.com/a/22944075/3358272). – r2evans May 03 '21 at 17:08

2 Answers2

1
  1. It is legal to use if in a mutate call, but in what you demonstrate here, it is wrong. Since you want to condition on a vector, you should consider ifelse (base R), or if_else (in dplyr).

    The first change to your code is something like:

    tweet_sentiments %>% 
      mutate(
        vehicle = ifelse(...)
      )
    
  2. text = 'f150' is an assignment, you need a comparison, which is == for equality. Progressive code changes:

    tweet_sentiments %>% 
      mutate(
        vehicle = if_else(text == "f150", "Ford",
                          if_else(text == "silverado", "Chevy", ...))
      )
    
  3. You need a default value, one that is assigned if text is neither "f150" nor "silverado". Options include a literal string like "unknown", or the R-idiomatic NA (which means effectively "not-applicable" or "could be anything"). Code progress:

    tweet_sentiments %>% 
      mutate(
        vehicle = if_else(text == "f150", "Ford",
                          if_else(text == "silverado", "Chevy", NA_character_))
      )
    

    (R has at least six kinds of NA, and if_else is rather particular about keeping the class of its yes= and no= arguments the same class. If you used ifelse instead, you could have kept it at NA at the risk of several of the other problems that base::ifelse presents. It has baggage.)

  4. You mentioned wildcards, which suggests that you may want to find "f150" as a substring in the text, in which case we will want grepl. Code progress:

    tweet_sentiments %>% 
      mutate(
        vehicle = if_else(grepl("f150", text), "Ford",
                          if_else(grepl("silverado", text), "Chevy", NA_character_))
      )
    

    grepl supports ignore.case= as well, in case you want to consider case-insensitive comparisons.

  5. Lastly, working this back around to a dplyr-idiomatic way of doing things ... whenever I see more than one nested ifelse (...), I immediately recommend dplyr::case_when. For instance, if you add another car type or two, it gets unwieldy:

    tweet_sentiments %>% 
      mutate(
        vehicle = if_else(grepl("f150", text), "Ford",
                          if_else(grepl("silverado", text), "Chevy",
                                  if_else(grepl("RAV4", text), "Toyota", NA_character_)))
      )
    

    but can be cleaned up (indents and parens) as:

    tweet_sentiments %>% 
      mutate(
        vehicle = case_when(
          grepl("f150", text) ~"Ford",
          grepl("silverado", text) ~ "Chevy",
          grepl("RAV4", text) ~ "Toyota",
          TRUE ~ NA_character_
        )
      )
    

Since you asked about "wildcards", if you don't know about regular expressions, or don't know the difference between regex and glob-style patterns, then I suggest you look at https://stackoverflow.com/a/22944075/3358272 (and perhaps ?glob2rx, for converting glob-style to regex, since grep* functions only deal with regex or fixed-strings).

r2evans
  • 77,184
  • 4
  • 55
  • 96
0

1) We can use case_when

tweet_sentiments %>%
      mutate(vehicle = case_when(text == 'f150' ~ 'ford',
              text == 'silverado' ~ 'Chevy'))

2) If it is substring, then use str_detect

library(stringr)
tweet_sentiments %>%
      mutate(vehicel = case_when(str_detect(text, 'f150' ~ 'Ford',
            str_detect(text, 'silverado') ~ 'Chevy'))

3) another option is %like%

 library(data.table)
 tweet_sentiments %>%
      mutate(vehicle = case_when(text %like% 'f150' ~ 'ford',
                      text %like% 'silverado' ~ 'Chevy'))

4) another option is rowwise with if/else

tweet_sentiments %>%
     rowwise %>%
     mutate(vehicle = if(str_detect(text, 'f150')) 'ford' 
        else if(str_detect(text, 'silverado')) 'Chevy' else NA_character_) %>%
     ungroup

5) or we can use fuzzy_join

keydat <- tibble(text = c('f150', 'silverado'), value = c('ford', 'Chevy'))
library(fuzzyjoin)
tweet_sentiments %>%
     regex_left_join(keydat, by = c('text')) %>%
     mutate(vehicle = coalesce(value, vehicle), value = NULL)
akrun
  • 674,427
  • 24
  • 381
  • 486