Note this answer started as an attempt to solve the much simpler problem posted in How to replace all values in data frame with a vector of values?. Unfortunately, this question was closed as duplicate of the actual question. So, I'll try to suggest a solution based on replacing factor levels for both cases, here.
In case there is only a vector (or one data frame column)
whose values need to be replaced and there are no objections to use factor we can coerce the vector to factor and change the factor levels as required:
x <- c(1, 1, 4, 4, 5, 5, 1, 1, 2)
x <- factor(x)
x
#[1] 1 1 4 4 5 5 1 1 2
#Levels: 1 2 4 5
replacement_vec <- c("A", "T", "C", "G")
levels(x) <- replacement_vec
x
#[1] A A C C G G A A T
#Levels: A T C G
Using the forcats
package this can be done in a one-liner:
x <- c(1, 1, 4, 4, 5, 5, 1, 1, 2)
forcats::lvls_revalue(factor(x), replacement_vec)
#[1] A A C C G G A A T
#Levels: A T C G
In case all values of multiple columns of a data frame need to be replaced, the approach can be extended.
foo <- data.frame(snp1 = c("AA", "AG", "AA", "AA"),
snp2 = c("AA", "AT", "AG", "AA"),
snp3 = c(NA, "GG", "GG", "GC"),
stringsAsFactors=FALSE)
level_vec <- c("AA", "AC", "AG", "AT", "GC", "GG")
replacement_vec <- c("0101", "0102", "0103", "0104", "0302", "0303")
foo[] <- lapply(foo, function(x) forcats::lvls_revalue(factor(x, levels = level_vec),
replacement_vec))
foo
# snp1 snp2 snp3
#1 0101 0101 <NA>
#2 0103 0104 0303
#3 0101 0103 0303
#4 0101 0101 0302
Note that level_vec
and replacement_vec
must have equal lengths.
More importantly, level_vec
should be complete , i.e., include all possible values in the affected columns of the original data frame. (Use unique(sort(unlist(foo)))
to verify). Otherwise, any missing values will be coerced to <NA>
. Note that this is also a requirement for Martin Morgans's answer.
So, if there are only a few different values to be replaced you will be probably better off with one of the other answers, e.g., Ramnath's.