R: Aggregate character strings

Question

I have a data frame ModelDF having columns with numeric as well as character values like:

Quantity        Type        Mode        Company
   1            Shoe        hello        Nike
   1            Shoe        hello        Nike
   2            Jeans       hello        Levis
   3            Shoe        hello        Nike
   1            Jeans       hello        Levis
   1            Shoe        hello        Adidas
   2            Jeans       hello        Spykar
   1            Shoe        ahola        Nike
   1            Jeans       ahola        Levis

I have to aggregate it in this form

Quantity        Type        Mode        Company
   5            Shoe        hello        Nike
   3            jeans       hello        Levis
   1            Shoe        hello        adidas
   2            jeans       hello        Spykar
   1            Shoe        ahola        Nike
   1            jeans       ahola        Levis

i.e. I have to aggregate and sum Quantity if all other columns are same.

I have tried it using aggregate but as it doesn't work on character values it is giving me wrong results.

What are my options? Thanks

Sotos · Answer 1 · 2016-05-12T08:02:31.603

aggregate(Quantity ~ Type + Mode + Company, df, sum)
#   Type  Mode Company Quantity
#1  Shoe hello  Adidas        1
#2 Jeans ahola   Levis        1
#3 Jeans hello   Levis        3
#4  Shoe ahola    Nike        1
#5  Shoe hello    Nike        5
#6 Jeans hello  Spykar        2

You can also try the data.table option:

setDT(df)[, .(Sum.Quantity = sum(Quantity)), by = list(Type, Mode, Company)]

#    Type  Mode Company Sum.Quantity
#1:  Shoe hello    Nike            5
#2: Jeans hello   Levis            3
#3:  Shoe hello  Adidas            1
#4: Jeans hello  Spykar            2
#5:  Shoe ahola    Nike            1
#6: Jeans ahola   Levis            1

Similarly with dplyr

df %>% 
  group_by(Type, Mode, Company) %>% 
               summarise(sum(Quantity))

DATA

dput(df)
structure(list(Quantity = c(1L, 1L, 2L, 3L, 1L, 1L, 2L, 1L, 1L
), Type = structure(c(2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L), .Label = c("Jeans", 
"Shoe"), class = "factor"), Mode = structure(c(2L, 2L, 2L, 2L, 
2L, 2L, 2L, 1L, 1L), .Label = c("ahola", "hello"), class = "factor"), 
    Company = structure(c(3L, 3L, 2L, 3L, 2L, 1L, 4L, 3L, 2L), .Label = c("Adidas", 
    "Levis", "Nike", "Spykar"), class = "factor")), .Names = c("Quantity", 
"Type", "Mode", "Company"), class = "data.frame", row.names = c(NA, 
-9L))

I have tried this but it is giving me error as "sum is not meaningful for factors" — Looper, May 12 '16 at 07:53
You probably have Quantity coded as a `factor` variable. Check `class(data$Quantity)` and if it is a factor, try `as.integer` and if it gives warnings then you want to be careful with those. — asb, May 12 '16 at 07:59
Yeah..I get it now!! I didn't covert the factors to numeric and character. Thanks for the help. — Looper, May 12 '16 at 08:00

score 0 · Answer 2 · answered May 12 '16 at 07:54

You don't want to 'aggregate strings', you want to aggregate numerics 'by' string variables. Here:

R> xx = data.frame(a=sample(letters[1:3], 10, TRUE),
                   b=sample(LETTERS[1:3], 10, TRUE),
                   c=runif(10))
R> xx
a b         c
1  b C 0.7094221
2  c B 0.2718095
3  c B 0.8844701
4  b C 0.9270141
5  b C 0.8243021
6  a A 0.3649902
7  a B 0.9763228
8  a A 0.8904676
9  b C 0.8640352
10 a A 0.7931683
R> aggregate(c ~ a + b, data=xx, FUN=sum)
a b         c
1 a A 2.0486261
2 a B 0.9763228
3 c B 1.1562796
4 b C 3.3247736

R: Aggregate character strings

2 Answers2

Linked