Could try to do something like this where you loop over the elements in x
and then use an ifelse
statement
x <- c("AA1", "AA2", "AA3", "AA4")
db <- data.frame(codes = sample(x, 10, TRUE))
db_new <- cbind(db, Reduce(cbind, lapply(x, function(i) ifelse(db$codes == i, 1, 0))))
If db is:
codes
1 AA4
2 AA1
3 AA4
4 AA1
5 AA2
6 AA4
7 AA4
8 AA1
9 AA1
10 AA1
Then output becomes:
codes init V2 V3 V4
1 AA4 0 0 0 1
2 AA1 1 0 0 0
3 AA4 0 0 0 1
4 AA1 1 0 0 0
5 AA2 0 1 0 0
6 AA4 0 0 0 1
7 AA4 0 0 0 1
8 AA1 1 0 0 0
9 AA1 1 0 0 0
10 AA1 1 0 0 0
EDIT:
EDIT:
It appears that your subscript is wrong. db$code[j]
will take the j
th element of the column code
in db
. So that will obviously not work. You could try this:
Assuming that you are using the same codes for all columns and that they are given in x
:
x <- c("AA1", "AA2", "AA3", "AA4")
Furthermore, assume that all your code columns are in your data.frame
and that this is the only data in your data.frame
.
db <- data.frame(codes_1 = sample(x, 10, TRUE),
codes_2 = sample(x, 10, TRUE))
Then we can use the fact that the data.frame
works like a list and can be passed through lapply
.
db_list <- lapply(seq_along(db), function(i, x) {
var <- db[[i]]
var_name <- colnames(db[i])
db_tmp <- cbind(db[i], Reduce(cbind, lapply(x, function(j) ifelse(var == j, 1, 0))))
colnames(db_tmp) <- c(var_name, paste(var_name, x, sep = "_"))
return(db_tmp)
}, x)
[[1]]
codes_1 codes_1_AA1 codes_1_AA2 codes_1_AA3 codes_1_AA4
1 AA1 1 0 0 0
2 AA2 0 1 0 0
3 AA4 0 0 0 1
4 AA3 0 0 1 0
5 AA3 0 0 1 0
6 AA3 0 0 1 0
7 AA3 0 0 1 0
8 AA1 1 0 0 0
9 AA4 0 0 0 1
10 AA4 0 0 0 1
[[2]]
codes_2 codes_2_AA1 codes_2_AA2 codes_2_AA3 codes_2_AA4
1 AA4 0 0 0 1
2 AA3 0 0 1 0
3 AA3 0 0 1 0
4 AA3 0 0 1 0
5 AA4 0 0 0 1
6 AA1 1 0 0 0
7 AA4 0 0 0 1
8 AA2 0 1 0 0
9 AA2 0 1 0 0
10 AA3 0 0 1 0
This gives you a list the length of the nubmer of columsn that you have, each with the desired matrix. If you want to get it all back into one, you can do this:
Reduce(cbind, db_list)