0

I have a set of data with a column of category and another column of subcategory. However, some of the category data is incorrect. I want to fix those incorrect category data based on subcategories.

For example, we have two categories: A and B with subcategories A_sub and B_sub. However, one of the entries for category is mistakenly input as "Others" but with a subcategory of "A_sub" (see code below). Is there any elegant way to change "Others" to "A" under the category column?

data <- data.frame("category" = c("A", "B", "B", "A", "Others", "Others"), "subcategory" = c("A_sub", "B_sub", "A_sub", "A_sub", "A_sub", "A_sub"))

The expected output is:

expected output

Thank you!

Liwei Dong
  • 53
  • 1
  • 5
  • Thanks for including a reproducible example! What have you tried so far and why didn't it work? – Andrew Jul 09 '19 at 18:20
  • Base r answer `data$category[data$category == "Others"] – InfiniteFlash Jul 09 '19 at 18:22
  • Also, this a duplicate of the [following](https://stackoverflow.com/a/28013895/5874001) – InfiniteFlash Jul 09 '19 at 18:23
  • I tried to manually correct it in excel since there are only 140 categories, but wondering if there is a elegant way in r to do it faster. The issue I am having right now is to find a quick way to create the backwards mapping (from subcategory to category). ie. how to automatically identify the correct category based on other subcategories that have the correct category mapping. – Liwei Dong Jul 09 '19 at 18:26
  • Thanks @InfiniteFlashChess, maybe I am misleading a little bit in the sample code. I changed it so that the 3rd row has the wrong category too. So essentially, I am trying to figure out how to check if the mapping between category and subcategory is correct. The actual data has 140 categories. Thanks! – Liwei Dong Jul 09 '19 at 18:30
  • Assuming I know the subcategories are all correct. – Liwei Dong Jul 09 '19 at 18:31

2 Answers2

0

With dplyr you can use ifelse statements inside of a mutate statement. So you could so something like:

data <- data %>%
dplyr::mutate(category = ifelse(category=="A" & subcategory=="Other", "A_sub", .$subcategory) 
SKyJim
  • 111
  • 6
0

Well if you know that subcategory is correct, why not just take the A/B from the subcategory string and replace the category, like this:

data %>% mutate(category = sub("_.*", "",subcategory))
joshpk
  • 536
  • 3
  • 11