1

I have a large data frame where I have one column (Phylum) that has repeated names and 253 other columns (each with a unique name) that have counts of the Phylum column. I would like to sum the counts within each column that correspond to each Phylum.

This is a simplified version of what my data look like:

     Phylum    sample1    sample2    sample3 ...    sample253
1    P1        2          3          5              5
2    P1        2          2          10             2
3    P2        1          0          0              1
4    P3        10         12         3              1
5    P3        5          7          14             15

I have seen similar questions, but they are for fewer columns, where you can just list the names of the columns you want summed. I don't want to enter 253 unique column names.

I would like my results to look like this

    Phylum    sample1    sample2    sample3 ...    sample253
1   P1        4          5          15             7
2   P2        1          0          0              1
3   P3        15         19         17             16

I would appreciate any help. Sorry for the format of the question, this is my first time asking for help on stackoverflow (rather than sleuthing).

KhadLily
  • 13
  • 3

1 Answers1

0

If your starting file looks like this (test.csv):

Phylum,sample1,sample2,sample3,sample253
P1,2,3,5,5
P1,2,2,10,2
P2,1,0,0,1
P3,10,12,3,1
P3,5,7,14,15

Then you can use group_by and summarise_each from dplyr:

read_csv('test.csv') %>% 
  group_by(Phylum) %>% 
  summarise_each(funs(sum))

(I first loaded tidyverse with library(tidyverse).)

Note that, if you were trying to do this for one column you can simply use summarise:

read_csv('test.csv') %>% 
  group_by(Phylum) %>% 
  summarise(sum(sample1))

summarise_each is required to run that function (in the above, funs(sum)) on each column.

snd
  • 1,353
  • 2
  • 17
  • 30
  • I received the error: `summarise_each()` is deprecated. So instead I used summarise_all() and it seemed to work! Thank you :) – KhadLily Jan 04 '19 at 01:34
  • Oh I didn't notice that error ``summarise_each()` is deprecated. Use `summarise_all()`, `summarise_at()` or `summarise_if()` instead. To map `funs` over all variables, use `summarise_all()`` ... thanks for pointing that out :) – snd Jan 04 '19 at 01:55