How to Count Unique rows in a data frame?

Question

I have a data frame in R which has a lot of duplicate records. I am interested in finding out how many records of each are in this data frame.

For example, I have this data frame:

Fake Name       Fake ID    Fake Status   Fake Program
June             0003         Green        PR1
June             0003         Green        PR1
Television       202          Blue         PR3
Television       202          Green        PR3    
Television       202          Green        PR3
CRT              12           Red          PR0

And from the above I would want to get something that's like similar to below:

Fake Name       Fake ID    Fake Status   Fake Program     COUNT
June             0003         Green        PR1              2
Television       202          Blue         PR3              1
Television       202          Green        PR3              2
CRT              12           Red          PR0              1

Any help would be appreciated. Thank you.

unique(DF) returns to me a list of unique rows found in my df. I need a count of how often a unique value appears joined next to my actuall df. — Alokin, Jul 10 '18 at 19:19

Psidom · Accepted Answer · 2018-07-10T20:11:01.813

Use group_by_all then count the number of rows with n:

df %>% group_by_all() %>% summarise(COUNT = n())

# A tibble: 4 x 5
# Groups:   Fake.Name, Fake.ID, Fake.Status [?]
#  Fake.Name  Fake.ID Fake.Status Fake.Program COUNT
#  <fct>        <int> <fct>       <fct>        <int>
#1 CRT             12 Red         PR0              1
#2 June             3 Green       PR1              2
#3 Television     202 Blue        PR3              1
#4 Television     202 Green       PR3              2

Or even better as from @Ryan's comment:

df %>% group_by_all %>% count

Rui Barradas · Answer 2 · 2019-04-17T17:31:50.137

The following uses duplicated to get the result data.frame and then rle to get the counts.

res <- dat[!duplicated(dat), ]

d <- duplicated(dat) | duplicated(dat, fromLast = TRUE)
res$COUNT <- rle(d)$lengths

res
#   Fake Name Fake ID Fake Status Fake Program COUNT
#1       June    0003       Green          PR1     2
#3 Television     202        Blue          PR3     1
#4 Television     202       Green          PR3     2
#6        CRT      12         Red          PR0     1

Benjamin Christoffersen · Answer 3 · 2018-07-10T19:57:45.517

To the question

How to Count Unique rows in a data frame?

Then use sum and duplicated. E.g.,

df <- data.frame(
  `Fake Name` = c(
    "June", "June", "Television", "Television", "Television", "CRT"),
  `Fake ID` = c("0003", "0003", "202", "202", "202", "12"),
  `Fake Status` = c("Green", "Green", "Blue", "Green", "Green", "Red"),
  `Fake Program` = c("PR1", "PR1", "PR3", "PR3", "PR3", "PR0"), 
  check.names = FALSE)
df
#R    Fake Name Fake ID Fake Status Fake Program
#R 1       June    0003       Green          PR1
#R 2       June    0003       Green          PR1
#R 3 Television     202        Blue          PR3
#R 4 Television     202       Green          PR3
#R 5 Television     202       Green          PR3
#R 6        CRT      12         Red          PR0
sum(!duplicated(df))
#R [1] 4

For the table you request then you can use data.table as follows

library(data.table)
df <- data.table(df)
df[, .(COUNT = .N), by = names(df)]
#R     Fake Name Fake ID Fake Status Fake Program COUNT
#R 1:       June    0003       Green          PR1     2
#R 2: Television     202        Blue          PR3     1
#R 3: Television     202       Green          PR3     2
#R 4:        CRT      12         Red          PR0     1

score -1 · Answer 4 · answered Jul 16 '20 at 17:15

-1

you could use:

n_distinct(data$col)

answered Jul 16 '20 at 17:15

Jack Novack

1

This seems more like a comment than an answer. See [answer]. You may have to build up some more reputation before being able to comment on Stack Overflow https://stackoverflow.com/help/privileges/comment – Peter Jul 16 '20 at 23:29

How to Count Unique rows in a data frame?

4 Answers4