2

I have a table called myTable (input) :

 user_name   session_num  
1     "Joe"            1    
2     "Tom"            2    
3    "Fred"            1    
4     "Tom"            1    
5     "Joe"            2    
6     "John"           1 

I want to know how many of my user_id have only session_num = 1 (output) :

   user_name   session_num   
1     "Fred"             1
2     "John"             1
David Arenburg
  • 87,271
  • 15
  • 123
  • 181
Smasell
  • 969
  • 1
  • 11
  • 16

4 Answers4

8

Here's a possible solution using data.table

library(data.table)
setDT(df)[, if(all(session_num == 1)) .SD, by = user_name]
#    user_name session_num
# 1:      Fred           1
# 2:      John           1

Another option is to try an anti join

df[session_num == 1][!df[session_num != 1], on = "user_name"]
#    user_name session_num
# 1:      Fred           1
# 2:      John           1
David Arenburg
  • 87,271
  • 15
  • 123
  • 181
  • You may also want to read [this](https://github.com/Rdatatable/data.table/wiki/Getting-started) in order to get more comfortable with `data.table` – David Arenburg Mar 17 '16 at 13:25
5

A comparable solution with dplyr:

library(dplyr)
myTable %>%
  group_by(user_name) %>%
  filter(all(session_num == 1))

which gives:

  user_name session_num
     (fctr)       (int)
1      Fred           1
2      John           1
Jaap
  • 71,900
  • 30
  • 164
  • 175
3

Alternatively we could simply exclude all users that have a session number other than 1, using base R.

# User's with session number other than 1
two <- myTable$user_name[myTable$session_num != 1] 

# Exclude them
myTable[!myTable$user_name %in% two,]
#  user_name session_num
#3      Fred           1
#6      John           1
mtoto
  • 21,499
  • 2
  • 49
  • 64
  • 1
    Interestingly we thought about this pretty much at the same time. Though I think using `!=1` instead of `==2` is safer in order to cover all possibilities. – David Arenburg Mar 17 '16 at 12:57
0

This is a 2 line answer:

library(data.table)
data1<-fread("test.csv")
data1[user_name == names(which(table(data1$user_name)==1)),][session_num==1,]

First it goes and looks at who is in the dataset only once and then secondly subsets on those where session_num==1.

Hanjo Odendaal
  • 1,175
  • 1
  • 11
  • 25
  • 1
    If you replace the `==` with `%in%` after `user_name` you get the expected result. With the latest official release of data.table (v1.9.6) the code throws an error. – RHertel Mar 17 '16 at 12:39
  • Not sure which of the `data.table` specific features this answer uses. – David Arenburg Mar 17 '16 at 12:42