1

I'm trying to make a new column that checks on a group (id and number) if two columns have the same observations (classification and classification-1").

This is the original data frame:

reprex <- tribble(~"id",    ~"number",  ~"year",   ~"classification",          ~"classification-1",
                  5,        7020,    2015,    "Trading de servicios",    "Servicios empresariales",
                  2,        4649,    2015,                 "Trading",                  "Comercial",
                  2,        4649,    2015,               "Comercial",                    "Trading",
                  2,        4649,    2016,                 "Trading",                  "Comercial",
                  2,        4649,    2016,               "Comercial",                    "Trading",
                  3,        4651,      2015,                   "Trading",                    "Comercial",
                  3,        4651,      2015,                   "Trading",                   "Comisiones",
                  3,        4651,      2015,                 "Comercial",                      "Trading",
                  3,        4651,      2015,                 "Comercial",                   "Comisiones")

I want to get this:

reprex <- tribble(~"id",    ~"number",  ~"year",   ~"classification",          ~"classification-1", ~"check",
                  5,        7020,    2015,    "Trading de servicios",    "Servicios empresariales",        T,
                  2,        4649,    2015,                 "Trading",                  "Comercial",        T
                  2,        4649,    2015,               "Comercial",                    "Trading",        T
                  2,        4649,    2016,                 "Trading",                  "Comercial",        T
                  2,        4649,    2016,               "Comercial",                    "Trading",        T
                  3,        4651,      2015,                   "Trading",                    "Comercial",        F
                  3,        4651,      2015,                   "Trading",                   "Comisiones",        F
                  3,        4651,      2015,                 "Comercial",                      "Trading",        F
                  3,        4651,      2015,                 "Comercial",                   "Comisiones",        F)
Paula
  • 303
  • 1
  • 7

1 Answers1

2

Perhaps this would help

library(dplyr)
reprex %>%
    group_by(id, number) %>% 
    mutate(check = length(intersect(classification, `classification-1`)) > 0)

Of if we need to check all the unique elements, then after grouping by 'id', 'number', get the unique elements of both classification, classification-1, check whether they are equal with setequal

reprex %>%
    group_by(id, number) %>%
    mutate(check = setequal(sort(unique(classification)), 
                              sort(unique(`classification-1`))))
akrun
  • 674,427
  • 24
  • 381
  • 486
  • How is this kind of data read into R? Also can one use `identical` here? – NelsonGon May 17 '19 at 14:43
  • 1
    @NelsonGon Initially, I used `identical`, but it is also picky with attributes. So `setequal` might be useful. I think the OP forgot to add `tribble` for the first case :=) – akrun May 17 '19 at 14:44
  • @akrun the solutions is working well in general but for example in cases like the last four rows fails. – Paula May 17 '19 at 14:59
  • @Paula The second solution gives me `FALSE` for id = 3 and number = 4651 (as showed in the expected) FOr id =5, 7020, check should be FALSE (based on your updated dataset) – akrun May 17 '19 at 15:03