Replace NA's belonging to one column with values from another column

Question

I have following data frame (named as 'df'), with columns names as ID, org1 and org2.

Structure:

 ID  org1  org2
 1    C1   NA
 2    C2   NA
 3    C3   C3
 4    C4   C4
 5    C1   NA
 6    NA   C2
 7    NA   C4
 8    NA   NA
 9    NA   NA
 10   C3   C3

Now, I want org1 to acquire values from org2 in case a 'NA' in org1 eg: ID 6

Expected output:

 ID  org1  org2
 1    C1   NA
 2    C2   NA
 3    C3   C3
 4    C4   C4
 5    C1   NA
 6    C2   C2
 7    C4   C4
 8    NA   NA
 9    NA   NA
 10   C3   C3

I tired the following:

   df$org1[is.na(df$org1)] <- as.character(df$org2[is.na(df$org1)])

referring to the following: Replace NA in column with value in adjacent column

But this yielded the following error:

  Warning message:
  In `[<-.factor`(`*tmp*`, is.na(df$org1),  :invalid factor level, NA generated

can someone tell me what I am doing wrong and is there a better solution for this.

akrun · Accepted Answer · 2015-11-09T10:31:24.280

Convert the factor columns to character class by looping through the factor columns with lapply.

  df1[-1] <- lapply(df1[-1], as.character)

and then we can either use pmax. The pmax will get the row wise max element between the two columns. As the 'org2' is almost identical with that of 'org1' except for the NA values, pmax with na.rm=TRUE remove the NA element when there is one NA and non-NA, or if there are two NAs return NA, or have both non-NA identical return the unique element.

df1$org1 <- do.call(pmax, c(df1[-1], list(na.rm=TRUE)))
df1
#   ID org1 org2
#1   1   C1 <NA>
#2   2   C2 <NA>
#3   3   C3   C3
#4   4   C4   C4
#5   5   C1 <NA>
#6   6   C2   C2
#7   7   C4   C4
#8   8 <NA> <NA>
#9   9 <NA> <NA>
#10 10   C3   C3

or use logical index based on is.na, subset the 'org1' based on index and replace it with the corresponding elements of 'org2'.

 ind <- is.na(df1$org1)
 df1$org1[ind] <- df1$org2[ind]

Or as @David Arenburg mentioned in the comments, we can data.table for faster assignment. We convert the 'data.frame' to 'data.table' (setDT(df1)), using the logical condition in 'i', we assign the 'org1' as 'org2' that corresponds to 'i'.

library(data.table)
setDT(df)[is.na(org1), org1 := org2]

data

df1 <- structure(list(ID = 1:10, org1 = structure(c(1L, 2L, 3L, 4L,   
 1L, NA, NA, NA, NA, 3L), .Label = c("C1", "C2", "C3", "C4"),
 class = "factor"), 
org2 = structure(c(NA, NA, 2L, 3L, NA, 1L, 3L, NA, NA, 2L
), .Label = c("C2", "C3", "C4"), class = "factor")), .Names = c("ID", 
"org1", "org2"), row.names = c(NA, -10L), class = "data.frame")

@user3875610 Based on the input data (assuming you have factor columns from the error), I am not getting any warnings. — akrun, Nov 09 '15 at 10:24
@ akrun, yes it has factors but i tired on a sample data it worked thanks, although I found David's solution a lot more easy. Akrun can you explain what exactly your code did — Anubhav Dikshit, Nov 09 '15 at 10:26

RajaSekhar · Answer 2 · 2015-11-09T12:46:34.860

0

The options(stringAsFactors =F) will make all the columns into string(as.character)

options(stringsAsFactors = F)
mydf <- data.frame(ID = 1:10, org1 = c('C1','C2','C3','C4','C1',NA,NA,NA,NA,'C3'),
                   org2 = c(NA,NA,'C3','C4',NA,'C2','C4',NA,NA,'C3'))
mydf$org1[is.na(mydf$org1)] <- mydf$org2[is.na(mydf$org1)]

edited Nov 09 '15 at 12:46

answered Nov 09 '15 at 12:40

RajaSekhar

61
1
6

This was already introduced in the other answer just without using `which`, which is redundant. – David Arenburg Nov 09 '15 at 12:42
Yeah which is redundant, made the chnges – RajaSekhar Nov 09 '15 at 12:48
But that is already present in the other answer, as already mentioned. – David Arenburg Nov 09 '15 at 12:49

Replace NA's belonging to one column with values from another column

2 Answers2

data