3

How can I count the number of distinct visit_ids per pagename?

visit_id  post_pagename
1       A
1       B
1       C
1       D 
2       A
2       A
3       A
3       B

Result should be:

post_pagename distinct_visit_ids
A     3
B     2
C     1
D     1

tried it with

test_df<-data.frame(cbind(c(1,1,1,1,2,2,3,3),c("A","B","C","D","A","A","A","B")))
colnames(test_df)<-c("visit_id","post_pagename")
test_df

test_df %>%
 group_by(post_pagename) %>%
  summarize(vis_count = n_distinct(visit_id))

But this gives me only the amount of distinct visit_id in my data set

flobrr
  • 167
  • 2
  • 12

2 Answers2

6

One way

test_df %>%
  distinct() %>%
  count(post_pagename)

#   post_pagename     n
#   <fct>         <int>
# 1 A                 3
# 2 B                 2
# 3 C                 1
# 4 D                 1

Or another

test_df %>%
  group_by(post_pagename) %>%
  summarise(distinct_visit_ids = n_distinct(visit_id))

# A tibble: 4 x 2
#  post_pagename distinct_visit_ids
#  <fct>                      <int>
#1 A                              3
#2 B                              2
#3 C                              1
#4 D                              1

*D has one visit, so it must be counted*
utubun
  • 3,635
  • 1
  • 14
  • 15
2

The function n_distinct() will give you the number of distict rows in your data, as you have 2 rows that are "2 A", you should use only n(),that will count the number of times your groupped variable appears.

test_df<-data.frame(cbind(c(1,1,1,1,2,2,3,3),c("A","B","C","D","A","A","A","B")))
colnames(test_df)<-c("visit_id","post_pagename")
test_df


test_df %>%
unique() %>%
group_by(post_pagename) %>%
summarize(vis_count = n())

This should work fine.

Hope it helps :)

Giovana Stein
  • 391
  • 2
  • 13
  • I get an error: Fehler: This function should not be called directly – flobrr Jun 20 '18 at 17:30
  • Try dplyr::summarize(vis_count = n()) – Giovana Stein Jun 20 '18 at 17:36
  • this mean you want the summarize function from dplyr package. You can see more about this error in here https://stackoverflow.com/questions/22801153/dplyr-error-in-n-function-should-not-be-called-directly – Giovana Stein Jun 20 '18 at 17:37
  • Giovana: I added: dplyr::summarize and no error occours. But the result is not correct. Please compare with my result in my question-post. – flobrr Jun 20 '18 at 17:37
  • Giovana; your query just count the number of each pagename-element. 4xA, 2xB, 1xC, 1xD – flobrr Jun 20 '18 at 17:40