How to process 100s of csv files, and write a single "summary statistics" csv, with one file per row?

Question

I have hundreds of *csv files. I would like to crunch some summary statistics for each one, and then record these statistics in a single dataframe/csv file, with each row from one csv.

Let's say it's the following data frame from base R

> mtcars
                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2

....

I might one to record the mean of mpg, i.e. mean(mtcars$mpg) is 20.09062.

The row for the resulting data frame would be

             mean_mpg   max_mpg  ... 
mtcars   20.09           33.9          ...
df2         232             92.7          ...

I know how to glob all of the *csv files in a certain path together, files = Sys.glob("*.csv")

for file in files:
    df = read.csv(file)
    mean = mean(df$mpg)
    ....

Now, I'm stuck. How do I write these values into a row for a giant summary csv?

(Sorry for the n00b question, but I'm a bit lost)

[(1) Put the filenames in a list; (2) read the files; (3) bind them together and make sure to create an id-column on the fly](http://stackoverflow.com/questions/32888757/reading-multiple-files-into-r-best-practice/32888918#32888918) and finally (4) [summarise by group (the id-column)](http://stackoverflow.com/questions/21982987/mean-per-group-in-a-data-frame) — Jaap, Apr 11 '17 at 20:25
@Jaap Thank you for this! This is a very comprehensive link-answer — ShanZhengYang, Apr 11 '17 at 20:34
Generally best to ask one question at a time. The only reason this isn't closed as a dupe already is because you've shoehorned two questions into it. — Frank, Apr 11 '17 at 20:36
I mean: "I would like to ... and then ...". Each of those parts has been asked many times, but it is hard to find an exact dupe with that sequence of tasks. It is preferable to link to dupes, but hard because of how you've structured it (lumping these two together). These two parts correspond to the two links Jaap gave (re reading tables and summarising tables). Not a big deal, just fyi for future posts. — Frank, Apr 11 '17 at 20:38

How to process 100s of csv files, and write a single "summary statistics" csv, with one file per row?

0 Answers0