0

I am attempting do make a vectorized "find and replace" of multiple strings in a data frame. In my mock data frame below, I want to replace "human" with "dog", and "cat" with "moose" .

Mock input:

df<-data.frame(organism=c("human","cat","bird","virus","bat","pangolian"),size=c(6,4,2,1,3,5))
df
   organism size
1     human    6
2       cat    4
3      bird    2
4     virus    1
5       bat    3
6 pangolian    5

expected output:

df1
   organism size
1       dog    6
2     moose    4
3      bird    2
4     virus    1
5       bat    3
6 pangolian    5

In reality, I have a large data frame and many replacements, so I would want to have my replacement strings in a vector like this:

replacement<-c("dog","moose")

I know this is supposed to be simply to solve, but as a newbie I simply can't get my head around it. So, thanks in advance.

2 Answers2

2

There are multiple ways to do find and replace. The following approach uses a named vector only, similar to a python dictionary:

organism_map <- levels(df$organism)
names(organism_map) <- organism_map
organism_map["human"] <- "dog"
organism_map["cat"] <- "moose"

The organism_map contains the mapping:

  human         cat        bird       virus         bat   pangolian 
  "dog"     "moose"      "bird"     "virus"       "bat" "pangolian" 

Then you just look up the organism_map vector values in the order of df$organism names, the result is saved back to df$organism column:

df$organism <- organism_map[df$organism]

Result:

   organism size
1       dog    6
2     moose    4
3      bird    2
4     virus    1
5       bat    3
6 pangolian    5
Emer
  • 3,321
  • 2
  • 26
  • 44
1

You can use str_replace from stringr :

pattern <- c('human', 'cat')
replacement <- c('dog', 'moose')
df$organism <- stringr::str_replace(df$organism, pattern, replacement)
Ronak Shah
  • 286,338
  • 16
  • 97
  • 143