1

I have following data frame (named as 'df'), with columns names as ID, org1 and org2.

Structure:

 ID  org1  org2
 1    C1   NA
 2    C2   NA
 3    C3   C3
 4    C4   C4
 5    C1   NA
 6    NA   C2
 7    NA   C4
 8    NA   NA
 9    NA   NA
 10   C3   C3

Now, I want org1 to acquire values from org2 in case a 'NA' in org1 eg: ID 6

Expected output:

 ID  org1  org2
 1    C1   NA
 2    C2   NA
 3    C3   C3
 4    C4   C4
 5    C1   NA
 6    C2   C2
 7    C4   C4
 8    NA   NA
 9    NA   NA
 10   C3   C3

I tired the following:

   df$org1[is.na(df$org1)] <- as.character(df$org2[is.na(df$org1)])

referring to the following: Replace NA in column with value in adjacent column

But this yielded the following error:

  Warning message:
  In `[<-.factor`(`*tmp*`, is.na(df$org1),  :invalid factor level, NA generated

can someone tell me what I am doing wrong and is there a better solution for this.

Community
  • 1
  • 1
Anubhav Dikshit
  • 1,535
  • 5
  • 20
  • 41

2 Answers2

2

Convert the factor columns to character class by looping through the factor columns with lapply.

  df1[-1] <- lapply(df1[-1], as.character)

and then we can either use pmax. The pmax will get the row wise max element between the two columns. As the 'org2' is almost identical with that of 'org1' except for the NA values, pmax with na.rm=TRUE remove the NA element when there is one NA and non-NA, or if there are two NAs return NA, or have both non-NA identical return the unique element.

df1$org1 <- do.call(pmax, c(df1[-1], list(na.rm=TRUE)))
df1
#   ID org1 org2
#1   1   C1 <NA>
#2   2   C2 <NA>
#3   3   C3   C3
#4   4   C4   C4
#5   5   C1 <NA>
#6   6   C2   C2
#7   7   C4   C4
#8   8 <NA> <NA>
#9   9 <NA> <NA>
#10 10   C3   C3

or use logical index based on is.na, subset the 'org1' based on index and replace it with the corresponding elements of 'org2'.

 ind <- is.na(df1$org1)
 df1$org1[ind] <- df1$org2[ind] 

Or as @David Arenburg mentioned in the comments, we can data.table for faster assignment. We convert the 'data.frame' to 'data.table' (setDT(df1)), using the logical condition in 'i', we assign the 'org1' as 'org2' that corresponds to 'i'.

library(data.table)
setDT(df)[is.na(org1), org1 := org2]   

data

df1 <- structure(list(ID = 1:10, org1 = structure(c(1L, 2L, 3L, 4L,   
 1L, NA, NA, NA, NA, 3L), .Label = c("C1", "C2", "C3", "C4"),
 class = "factor"), 
org2 = structure(c(NA, NA, 2L, 3L, NA, 1L, 3L, NA, NA, 2L
), .Label = c("C2", "C3", "C4"), class = "factor")), .Names = c("ID", 
"org1", "org2"), row.names = c(NA, -10L), class = "data.frame")
akrun
  • 674,427
  • 24
  • 381
  • 486
0

The options(stringAsFactors =F) will make all the columns into string(as.character)

options(stringsAsFactors = F)
mydf <- data.frame(ID = 1:10, org1 = c('C1','C2','C3','C4','C1',NA,NA,NA,NA,'C3'),
                   org2 = c(NA,NA,'C3','C4',NA,'C2','C4',NA,NA,'C3'))
mydf$org1[is.na(mydf$org1)] <- mydf$org2[is.na(mydf$org1)]
RajaSekhar
  • 61
  • 1
  • 6