r data formatting with NAs

Question

If I have a dataset with three columns like this below

 Id    Date       Gender
 1     NA         F
 1     NA         NA
 1     03-11-1977 NA
 2     04-17-2005 NA
 2     NA         M
 3     NA         NA 
 3     06-04-1999 NA
 3     NA         F

How could I clean this data such that I see a dataset like this below ?

Id    Date        Gender
 1     03-11-1977 F
 2     04-17-2005 M
 3     06-04-1999 F

Thanks.

Does this answer your question? [Remove rows with all or some NAs (missing values) in data.frame](https://stackoverflow.com/questions/4862178/remove-rows-with-all-or-some-nas-missing-values-in-data-frame) — Gnin, Jun 03 '20 at 05:50

score 1 · Answer 1 · answered Jun 03 '20 at 05:53

You may use na.omit in a by approach.

dat <- do.call(rbind, by(dat, dat$Id, function(x) cbind(x[1,1,drop=F], lapply(x[-1], na.omit))))
dat
#   Id       Date Gender
# 1  1 03-11-1977      F
# 2  2 04-17-2005      M
# 3  3 06-04-1999      F

Data:

dat <- read.table(header=T,text=' Id    Date       Gender
 1     NA         F
 1     NA         NA
 1     03-11-1977 NA
 2     04-17-2005 NA
 2     NA         M
 3     NA         NA 
 3     06-04-1999 NA
 3     NA         F')

@KimJenkins Which error? Could you be more specific? ;) – jay.sf Jun 03 '20 at 06:43 — jay.sf, Jun 03 '20 at 06:43

score 1 · Accepted Answer · answered Jun 03 '20 at 05:56

fill the values by Id and filter NA values.

library(dplyr)
df %>%
  group_by(Id) %>%
  tidyr::fill(Gender, .direction = "updown") %>%
  filter(!is.na(Date))

#     Id Date       Gender
#  <int> <chr>      <chr> 
#1     1 03-11-1977 F     
#2     2 04-17-2005 M     
#3     3 06-04-1999 F

r data formatting with NAs

2 Answers2