0

I have a data frame with many rows (say 4 rows in the example below), and each row has a certain number of columns associated with it as below:

1 91 90 20
2 21 NA NA
3 20 20 NA
4 30 NA NA

The numbers 1,2,3 and 4 in the far left are row IDs. I need to extract the rows that contain more than one number across all associated columns. So what I would expect is:

1 91 90 20
3 20 20 NA

I have tried using "which" in combination with "lapply" but this just gives me TRUE or FALSE as output, whereas I need the actual values as above.

StephD
  • 83
  • 1
  • 7

1 Answers1

2

You can do that by using rowSums in conjunction with just checking if there is an na, and filtering to greater than 1.

df[rowSums(!is.na(df)) > 1,]

Breakdown:

df <- data.frame(x = c(91, 21, 20, 30), y = c(90, NA, 20, NA), z = c(20, NA, NA, NA))

We can turn it into a T/F matrix by:

!is.na(df)
        x     y     z
[1,] TRUE  TRUE  TRUE
[2,] TRUE FALSE FALSE
[3,] TRUE  TRUE FALSE
[4,] TRUE FALSE FALSE

This shows where there are and aren't numbers. Now we just need to sum up the rows:

rowSums(!is.na(df))
[1] 3 1 2 1

This yields the # of non-NA entries per row. Now we can change that back into a logical vector by looking for only ones that have more than 1:

rowSums(!is.na(df)) > 1
[1]  TRUE FALSE  TRUE FALSE

Now subset the df with that:

df[rowSums(!is.na(df)) > 1,]
   x  y  z
1 91 90 20
3 20 20 NA
Matt W.
  • 3,331
  • 16
  • 40