9

I have a data frame in R which has a lot of duplicate records. I am interested in finding out how many records of each are in this data frame.

For example, I have this data frame:

Fake Name       Fake ID    Fake Status   Fake Program
June             0003         Green        PR1
June             0003         Green        PR1
Television       202          Blue         PR3
Television       202          Green        PR3    
Television       202          Green        PR3
CRT              12           Red          PR0

And from the above I would want to get something that's like similar to below:

Fake Name       Fake ID    Fake Status   Fake Program     COUNT
June             0003         Green        PR1              2
Television       202          Blue         PR3              1
Television       202          Green        PR3              2
CRT              12           Red          PR0              1

Any help would be appreciated. Thank you.

Alokin
  • 287
  • 2
  • 12
  • unique(DF) returns to me a list of unique rows found in my df. I need a count of how often a unique value appears joined next to my actuall df. – Alokin Jul 10 '18 at 19:19

4 Answers4

16

Use group_by_all then count the number of rows with n:

df %>% group_by_all() %>% summarise(COUNT = n())

# A tibble: 4 x 5
# Groups:   Fake.Name, Fake.ID, Fake.Status [?]
#  Fake.Name  Fake.ID Fake.Status Fake.Program COUNT
#  <fct>        <int> <fct>       <fct>        <int>
#1 CRT             12 Red         PR0              1
#2 June             3 Green       PR1              2
#3 Television     202 Blue        PR3              1
#4 Television     202 Green       PR3              2

Or even better as from @Ryan's comment:

df %>% group_by_all %>% count
Psidom
  • 171,477
  • 20
  • 249
  • 286
3

The following uses duplicated to get the result data.frame and then rle to get the counts.

res <- dat[!duplicated(dat), ]

d <- duplicated(dat) | duplicated(dat, fromLast = TRUE)
res$COUNT <- rle(d)$lengths

res
#   Fake Name Fake ID Fake Status Fake Program COUNT
#1       June    0003       Green          PR1     2
#3 Television     202        Blue          PR3     1
#4 Television     202       Green          PR3     2
#6        CRT      12         Red          PR0     1
Rui Barradas
  • 44,483
  • 8
  • 22
  • 48
2

To the question

How to Count Unique rows in a data frame?

Then use sum and duplicated. E.g.,

df <- data.frame(
  `Fake Name` = c(
    "June", "June", "Television", "Television", "Television", "CRT"),
  `Fake ID` = c("0003", "0003", "202", "202", "202", "12"),
  `Fake Status` = c("Green", "Green", "Blue", "Green", "Green", "Red"),
  `Fake Program` = c("PR1", "PR1", "PR3", "PR3", "PR3", "PR0"), 
  check.names = FALSE)
df
#R    Fake Name Fake ID Fake Status Fake Program
#R 1       June    0003       Green          PR1
#R 2       June    0003       Green          PR1
#R 3 Television     202        Blue          PR3
#R 4 Television     202       Green          PR3
#R 5 Television     202       Green          PR3
#R 6        CRT      12         Red          PR0
sum(!duplicated(df))
#R [1] 4

For the table you request then you can use data.table as follows

library(data.table)
df <- data.table(df)
df[, .(COUNT = .N), by = names(df)]
#R     Fake Name Fake ID Fake Status Fake Program COUNT
#R 1:       June    0003       Green          PR1     2
#R 2: Television     202        Blue          PR3     1
#R 3: Television     202       Green          PR3     2
#R 4:        CRT      12         Red          PR0     1
-1

you could use:

n_distinct(data$col)
  • This seems more like a comment than an answer. See [answer]. You may have to build up some more reputation before being able to comment on Stack Overflow https://stackoverflow.com/help/privileges/comment – Peter Jul 16 '20 at 23:29