0

I have a data frame like this:

idx  type  val1 val2 val3 val4 val5 val6
1    a     0.2   NA   NA   NA   NA   NA
2    a     0.3   NA   NA   NA   NA   NA 
3    a     0.2   NA   NA   NA   NA   NA
4    a     NA    0.3  NA   NA   NA   NA 
5    a     NA    0.5  NA   NA   NA   NA
6    a     NA    0.2  NA   NA   NA   NA
7    a     NA    NA   0.2  NA   NA   NA
8    a     NA    NA   0.5  NA   NA   NA
9    a     NA    NA   0.4  NA   NA   NA
10   a     NA    NA   NA   0.4  NA   NA
11   a     NA    NA   NA   0.6  NA   NA
12   a     NA    NA   NA   0.6  NA   NA
.
. 
.
34   b     NA    NA   NA   NA   NA   0.6
35   b     NA    NA   NA   NA   NA   0.4
36   b     NA    NA   NA   NA   NA   0.3

I want to combine the rows and remove the NA's. So this is what I want to achieve:

idx  type  val1 val2 val3 val4 val5 val6
1    a     0.2  0.3  0.2  0.4  0.3  0.2
2    a     0.3  0.5  0.5  0.6  0.4  0.5
3    a     0.2  0.2  0.4  0.6  0.5  0.6
4    b     0.4  0.2  0.2  0.5  0.4  0.6
5    b     0.3  0.5  0.6  0.3  0.6  0.4
6    b     0.3  0.4  0.3  0.6  0.5  0.3
lmo
  • 35,764
  • 9
  • 49
  • 57
Tpg333
  • 33
  • 5
  • `na.omit`? `drop.na`? – tyluRp Nov 16 '17 at 23:29
  • 1
    Your example is confusing to me. How are you moving values up within a column but still retaining the same `idx` ? I.e. - The values `0.3/0.5/0.2` in `val2` used to be associated with `idx` `4/5/6` but now they are against `idx` `1/2/3`. What is the logic here? – thelatemail Nov 16 '17 at 23:35
  • Well, that's what OP wants to know. Basically removing NA values by column, not row, and then shifting up. – spinodal Nov 16 '17 at 23:37
  • 1
    @spinodal - the point is there is no way to really tell what OP exactly needs here. Is the `type` locked by row to a value but the `idx` isn't? Does every column have 3 values for `type=='a'` and 3 for `type=='b'`. A slightly smaller example showing every value in the "before" data.frame and every value in the "after" would be ideal. We can all guess what might be the expected output, but what's the point if it can be made obvious? – thelatemail Nov 17 '17 at 00:22
  • sorry for the confusion, `idx` column is not important and we can drop that column. Also, `type` is not locked by any value. What @spinodal said was correct, I want the remove NA's in each column, shift the values up and collapse the total numbers of rows. – Tpg333 Nov 17 '17 at 16:37
  • In general it's useful to share your data using `dput()` if possible. e.g.: > dput(test.df) structure(list(cat = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("a", "b"), class = "factor"), x1 = c(1, NA, NA, 5.5, NA, NA), x2 = c(NA, 2, NA, NA, 4.5, NA), x3 = c(NA, NA, 3, NA, NA, 3.5)), .Names = c("cat", "x1", "x2", "x3"), row.names = c(NA, -6L), class = "data.frame") That way people can copy/paste data into code. – Max Candocia Nov 17 '17 at 19:20

1 Answers1

0

You might just want to apply a function to each of the columns of interest and then cbind it to the category column filtered:

test.df = data.frame(cat = rep(c('a','b'), each=3),x1=c(1,NA,NA,5.5,NA,NA),
                     x2=c(NA,2,NA,NA,4.5,NA),
                     x3=c(NA,NA,3.,NA,NA,3.5))

collapse_column <- function(data, col){
  data[!is.na(data[,col]),col]
}

main_vals = sapply(2:4,collapse_column, data=test.df)

cat_vals = test.df[!is.na(test.df[,2]),'cat']

new_df = as.data.frame(cbind(cat_vals, main_vals))
names(new_df) = names(test.df)
Max Candocia
  • 3,774
  • 27
  • 48