0

If I have a dataset with three columns like this below

 Id    Date       Gender
 1     NA         F
 1     NA         NA
 1     03-11-1977 NA
 2     04-17-2005 NA
 2     NA         M
 3     NA         NA 
 3     06-04-1999 NA
 3     NA         F

How could I clean this data such that I see a dataset like this below ?

Id    Date        Gender
 1     03-11-1977 F
 2     04-17-2005 M
 3     06-04-1999 F

Thanks.

Kim Jenkins
  • 430
  • 2
  • 16
  • 1
    Does this answer your question? [Remove rows with all or some NAs (missing values) in data.frame](https://stackoverflow.com/questions/4862178/remove-rows-with-all-or-some-nas-missing-values-in-data-frame) – Gnin Jun 03 '20 at 05:50
  • @Gnin Probably not, OP's data is grouped by `Id`. – jay.sf Jun 03 '20 at 05:56

2 Answers2

1

You may use na.omit in a by approach.

dat <- do.call(rbind, by(dat, dat$Id, function(x) cbind(x[1,1,drop=F], lapply(x[-1], na.omit))))
dat
#   Id       Date Gender
# 1  1 03-11-1977      F
# 2  2 04-17-2005      M
# 3  3 06-04-1999      F

Data:

dat <- read.table(header=T,text=' Id    Date       Gender
 1     NA         F
 1     NA         NA
 1     03-11-1977 NA
 2     04-17-2005 NA
 2     NA         M
 3     NA         NA 
 3     06-04-1999 NA
 3     NA         F')
jay.sf
  • 33,483
  • 5
  • 39
  • 75
1

fill the values by Id and filter NA values.

library(dplyr)
df %>%
  group_by(Id) %>%
  tidyr::fill(Gender, .direction = "updown") %>%
  filter(!is.na(Date))

#     Id Date       Gender
#  <int> <chr>      <chr> 
#1     1 03-11-1977 F     
#2     2 04-17-2005 M     
#3     3 06-04-1999 F     
Ronak Shah
  • 286,338
  • 16
  • 97
  • 143