2

I want to replace multiple values in an R dataframe using setNames as described in https://stackoverflow.com/a/7548031/4169924, but it gives (seemingly) unexpected results:

df1 <- data.frame(Measure = c("Min", "min", "Minimum"), Value = c(1,2,3))
map1 = setNames(c("Minimum", "Minimum", "Minimum"), c("Min", "min", "Minimum"))
df1$Measure <- map1[df1$Measure]
df1

Gives the expected result:

  Measure Value
1 Minimum 1
2 Minimum 2
3 Minimum 3

However, for

df2 <- data.frame(Measure = c("Min", "min", "Minimum", "MaxVal"), Value = c(1,2,3,4))
map2 = setNames(c("Minimum", "Minimum", "Minimum", "MaxVal"), c("Min", "min", "Minimum", "MaxVal"))
df2$Measure <- map2[df2$Measure]
df2

I get:

  Measure Value
1 Minimum 1
2 Minimum 2
3 MaxVal  3
4 Minimum 4

Where Measure for rows 3 and 4 seems to be incorrectly replaced. Why?

Community
  • 1
  • 1
bugfoot
  • 657
  • 4
  • 17

1 Answers1

3

Because the first column is a factor, not a character vector. Create it with:

df2 <- data.frame(Measure = c("Min", "min", "Minimum", "MaxVal"), Value = c(1,2,3,4), stringsAsFactors=FALSE)

Or convert at this step:

df2$Measure <- map2[as.character(df2$Measure)]
mpjdem
  • 1,434
  • 7
  • 13
  • Thanks @mpjdem, so `setNames` can only be used to replace character vectors, but not factors, in which case it behaves unpredictably... – bugfoot Dec 14 '16 at 14:38
  • 1
    It is not unpredictable if you look at what's inside the factor, with `as.integer(df2$Measure)`. R uses the factor levels as numerical indices into the map, instead of matching them to the names. And the order of the factor levels is simply alphabetical. – mpjdem Dec 14 '16 at 14:42