0

I have hundreds of *csv files. I would like to crunch some summary statistics for each one, and then record these statistics in a single dataframe/csv file, with each row from one csv.

Let's say it's the following data frame from base R

> mtcars
                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2

....

I might one to record the mean of mpg, i.e. mean(mtcars$mpg) is 20.09062.

The row for the resulting data frame would be

             mean_mpg   max_mpg  ... 
mtcars   20.09           33.9          ...
df2         232             92.7          ...

I know how to glob all of the *csv files in a certain path together, files = Sys.glob("*.csv")

for file in files:
    df = read.csv(file)
    mean = mean(df$mpg)
    ....

Now, I'm stuck. How do I write these values into a row for a giant summary csv?

(Sorry for the n00b question, but I'm a bit lost)

ShanZhengYang
  • 12,508
  • 35
  • 106
  • 190
  • 2
    [(1) Put the filenames in a list; (2) read the files; (3) bind them together and make sure to create an id-column on the fly](http://stackoverflow.com/questions/32888757/reading-multiple-files-into-r-best-practice/32888918#32888918) and finally (4) [summarise by group (the id-column)](http://stackoverflow.com/questions/21982987/mean-per-group-in-a-data-frame) – Jaap Apr 11 '17 at 20:25
  • http://stackoverflow.com/a/36901707/5133721 for an example – Carl Boneri Apr 11 '17 at 20:26
  • @CarlBoneri there is nothing in it about summarising ... – Jaap Apr 11 '17 at 20:28
  • teach a man to fish..... – Carl Boneri Apr 11 '17 at 20:28
  • @CarlBoneri Thanks for the fish! Got it now – ShanZhengYang Apr 11 '17 at 20:33
  • best thing I've heard all week haha. good stuff! – Carl Boneri Apr 11 '17 at 20:34
  • 1
    @Jaap Thank you for this! This is a very comprehensive link-answer – ShanZhengYang Apr 11 '17 at 20:34
  • 2
    Generally best to ask one question at a time. The only reason this isn't closed as a dupe already is because you've shoehorned two questions into it. – Frank Apr 11 '17 at 20:36
  • @Frank I'm confused. Isn't there one question here? – ShanZhengYang Apr 11 '17 at 20:36
  • I mean: "I would like to ... and then ...". Each of those parts has been asked many times, but it is hard to find an exact dupe with that sequence of tasks. It is preferable to link to dupes, but hard because of how you've structured it (lumping these two together). These two parts correspond to the two links Jaap gave (re reading tables and summarising tables). Not a big deal, just fyi for future posts. – Frank Apr 11 '17 at 20:38
  • @Frank Understood. I'll try to be more succinct next time. – ShanZhengYang Apr 11 '17 at 22:03

0 Answers0