2

I have a dataframe which has the following form

ID1 | VALUE |ID2 | VALUE | ID3 | VALUE
111  | one   |333 | three | 555 | five
222  | two   |    | NA    | 666 | six

Note that the number of values/rows in each column varies (NAs/empty values).

I want the result to look like this:

ID  | VALUE 
111 | one
222 | two
333 | three
555 | five
666 | six

I tried the following:

x = as.matrix(df)
x= matrix(x,ncol = 2)

This process mixes IDs and VALUEs for some columns. My guess is that this is because of NA and NULL values.

ID  | VALUE 
111 | one
222 | two
333 | three
five| 555
six | 666

Could you please suggest a quick solution(s) to my problem?

Elegant Solutions

Thanks to nongkrong and akrun for answers. I don't know about speed, but the most elegant solutions that got the job done are the following:

  • my_final = data.frame(VALUE=unlist(my_data[c(TRUE,FALSE)]), ID=unlist(my_data[c(FALSE, TRUE)]))
  • my_final = setNames(data.frame(lapply(split(as.list(my_data), c(F, T)), unlist)), c("VALUE", "ID"))

Also to omit empty rows:

  • my_final_trimmed = my_final[complete.cases(my_final), ]
Community
  • 1
  • 1
dbudaghyan
  • 53
  • 1
  • 8

3 Answers3

2

One option is using melt from the devel version of data.table. Instructions to install the devel version are here.

We convert the 'data.frame' to 'data.table' (setDT(df1)). Using melt, we specify the measure argument with regex patterns of column names that start with 'ID' and 'VALUE'. Optional arguments include naming the 'value' columns i.e. value.name and removing the NA values with na.rm=TRUE. If we don't need the 'variable' column, it can be assigned (:=) to NULL.

library(data.table)#v1.9.5+
melt(setDT(df1), measure= patterns("^ID", "^VALUE"), 
   value.name= c("ID", "VALUE"), na.rm=TRUE)[,variable:=NULL][]
#   ID VALUE
#1: 111   one
#2: 222   two
#3: 333 three
#4: 555  five
#5: 666   six

data

df1 <- structure(list(ID1 = c(111L, 222L), VALUE = c("one", "two"), 
ID2 = c(333L, NA), VALUE.1 = c("three", NA), ID3 = c(555L, 
666L), VALUE.2 = c("five", "six")), .Names = c("ID1", "VALUE", 
"ID2", "VALUE.1", "ID3", "VALUE.2"), class = "data.frame",
row.names = c(NA, -2L))
akrun
  • 674,427
  • 24
  • 381
  • 486
1

Well, sorry if this answer is too specific for your case, but taking the data frame in that form, the answer is approachable (inelegantly) as follows:

1) annihilate column names

colnames(dataframe_name) <- NULL

2) split into smaller data frames by subsetting

df1 <- dataframe_name[, 1:2]  # columns 1-2
df2 <- dataframe_name[, 3:4]  # columns 3-4
df3 <- dataframe_name[, 5:6]  # columns 5-6

3) add column names to the data frames

f <- function(df) {
    colnames(df) <- c('ID', 'VALUE')  # set colnames for df with 2 cols
    return(df)
} # function to apply to each df

l <- list(df1, df2, df3)
list_of_frames <- lapply(l, f)  # apply colname function to each

4) bind the data frames together

do.call(rbind, list_of_frames)  # call rbind (row bind) on each frame
Illumin8s
  • 121
  • 1
  • 8
1

Another base R, using split to separate alternating columns

res <- setNames(
    data.frame(lapply(split(as.list(dat), c(F, T)), unlist))
  , c("VALUE", "ID"))

res[complete.cases(res), ]
#      VALUE      ID
# ID11   111  one   
# ID12   222  two   
# ID21   333  three 
# ID31   555    five
# ID32   666     six
Rorschach
  • 28,468
  • 5
  • 66
  • 112