-1

I have a large dataset which has two fields that record dates (firstVisit and secondVisit).

I want to find out how many people visited a specific site on any given date, regardless of whether it is their first or second visit.

For example, let's say that on the 1st February 2021 at a particular site, 100 people attended their firstVisit and 25 people attended their secondVisit; I want to be able to say that on 1st February 2021, 125 people visited.

At the moment I have this (using dplyr):

Site     firstVisit    secondVisit
site1    100           25
site2    113           100
site3    84            90

I'm sure that I could just sum the columns up like this:

Site     firstVisit    secondVisit    totalVisits
site1    100           25             125
site2    113           100            213
site3    84            90             174

Ideally I want it to look like this:

Date            Site     TotalVisits
01/01/2020      site1    125
02/01/2020      site1    213
02/02/2020      site2    174
...
27/02/2020      site1    326
27/02/2020      site3    301
28/02/2020      site4    217

Is there a better approach?

How can I do this?

Sample data:

df <- data.frame(
  site = sprintf("site%s",seq(1:5)),
  firstVisit = sample(seq(as.Date('2021/01/01'), as.Date('2021/02/28'), by = "day"), 100, replace = TRUE),
  secondVisit = sample(seq(as.Date('2021/02/01'), as.Date('2021/03/31'), by = "day"), 100, replace = TRUE)
)
Mus
  • 5,994
  • 19
  • 75
  • 112
  • Does this answer your question? [Sum across multiple columns with dplyr](https://stackoverflow.com/questions/28873057/sum-across-multiple-columns-with-dplyr) – AnilGoyal Feb 16 '21 at 11:35
  • @AnilGoyal Not quite as I want to count the number of date-formatted values rather than binary values. – Mus Feb 16 '21 at 11:40
  • So, please can you share some data as [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – AnilGoyal Feb 16 '21 at 11:43
  • https://stackoverflow.com/questions/47590896/r-summing-a-sequence-of-columns-row-wise-with-dplyr – AnilGoyal Feb 16 '21 at 11:45
  • The data is confidential so I can't share anything that I currently have; however, I have a column of `site` names, a column of `firstVisit` dates and a column of `secondVisit` dates and what I want is to know how many visits were made on any date at any given site. The middle example I give in my question is how my data currently looks. – Mus Feb 16 '21 at 11:46
  • Sample dataframe code added. – Mus Feb 16 '21 at 11:55
  • In that case, you'll first need to picot_longer data, then groupby on pivoted column and then find out count – AnilGoyal Feb 16 '21 at 12:37

1 Answers1

1

Do it as follows

set.seed(202)
df <- data.frame(
  site = sprintf("site%s",seq(1:5)),
  firstVisit = sample(seq(as.Date('2021/01/01'), as.Date('2021/02/28'), by = "day"), 100, replace = TRUE),
  secondVisit = sample(seq(as.Date('2021/02/01'), as.Date('2021/03/31'), by = "day"), 100, replace = TRUE)
)

df %>% pivot_longer(-site) %>%
  group_by(site, value) %>%
  summarise(TotalVisits = n())

# A tibble: 158 x 3
# Groups:   site [5]
   site  value      TotalVisits
   <chr> <date>           <int>
 1 site1 2021-01-01           1
 2 site1 2021-01-02           2
 3 site1 2021-01-03           1
 4 site1 2021-01-04           1
 5 site1 2021-01-08           1
 6 site1 2021-01-13           1
 7 site1 2021-01-16           2
 8 site1 2021-01-21           1
 9 site1 2021-01-27           1
10 site1 2021-01-28           1
# ... with 148 more rows

#OR only site-wise

df %>% pivot_longer(-site) %>%
  group_by(site) %>%
  summarise(TotalVisits = n())

# A tibble: 5 x 2
  site  TotalVisits
  <chr>       <int>
1 site1          40
2 site2          40
3 site3          40
4 site4          40
5 site5          40
AnilGoyal
  • 14,820
  • 3
  • 16
  • 30