0

The following is an example of how I want to treat my data sets. It might be a bit different to understand how my data frame is structured, but I hope it makes sense:

First density must be calculated for columns A, B, and C using raw data from columns ADry, AEthanol, BDry ...... (Since these were earlier defined as vectors too, i used the vectors instead data frame columns as it was shorter - ADry_1_0 instead of Sample_1_0$ADry_1_0)

Sample_1_0$ADensi_1_0=(ADry_1_0/(ADry_1_0-AEthanol_1_0))*(peth-pair)+pair 
Sample_1_0$BDensi_1_0=(BDry_1_0/(BDry_1_0-BEthanol_1_0))*(peth-pair)+pair
Sample_1_0$CDensi_1_0=(CDry_1_0/(CDry_1_0-CEthanol_1_0))*(peth-pair)+pair

This yields 10 densities for both A, B, and C. What's interesting is the mean density

Mean_1_0=apply(Sample_1_0[7:9],2,mean)

Next standard deviations are found. We are mainly interested in standard deviations for our raw data columns (ADry and AEthanol), as error propagation calculations are afterwards carried out to find out how the deviations sum up when calculating the densities

StdAfv_1_0=apply(Sample_1_0,2,sd)

Error propagation (same for B and C)

ASd_1_0=(sqrt((sd(Sample_1_0$ADry_1_0)/mean(Sample_1_0$ADry_1_0))^2+(sqrt((sd(Sample_1_0$ADry_1_0)^2+sd(Sample_1_0$AEthanol_1_0)^2))/(mean(Sample_1_0$ADry_1_0)-mean(Sample_1_0$AEthanol_1_0)))^2))*mean(Sample_1_0$ADensi_1_0)

In the end we semi manually gathered the end informations (mean density and deviation hereof) in a plot-able dataframe. Some of the codes might be a tad long and maybe we could have achieved equal results using shorter codes, but bear with us, we are rookies.

So now to the real actual problem

This was for A_1_0, B_1_0, and C_1_0. We would like to apply the same series of commands to 15 other data frames. The dimensions are the same, and they will be named A_1_1, A_1_2, A_2_0 and so on.

Is it possible to use some kind of loop function or make a loadable script containing x and y placeholders, where we can easily insert A_1_1 for instance??

Thanks in advance, i tried to keep the amount of confusion at a minimum, although it's tough!

Data list

  • You're making things difficult for yourself by sequentially naming variables, encoding information in the object name. Instead, you [should be using a list of data frames](http://stackoverflow.com/a/24376207/903061). It is easy to iterate over a list and store results in another list. – Gregor Thomas Mar 19 '17 at 23:02
  • Alternatively, if your data won't all fit in memory at the same time, write a function where the file name is the input parameter, and iterate over the file names rather than over R object names. – Gregor Thomas Mar 19 '17 at 23:25
  • Thanks a lot Gregor, just read the whole thing about list of data frames, and i will try it out right away. What is the best way to deal with that really long line of calculation regarding error propagation?? – Chemistry101 Mar 20 '17 at 07:57
  • @Gregor i managed to get the list made etc., but i think i'm too new to R to handle these code-heavy solutions. Thanks a bunch for your help, let me know if i'm really close, if not i'll just go back to the slavery with tons of data frames.... Added picture of my lists to the bottom of original post!! – Chemistry101 Mar 20 '17 at 10:29
  • Your list of data frames looks correct. Now you should be able to apply that to the function i gave below. You can add the error propagation function into the `list (mean = mean (x), sd = sd (x) )` – tbradley Mar 20 '17 at 10:56

1 Answers1

1

If instead of individual vectors you combine the raw data into data frames (or even better data.tables) and then subsequently store all the data frames for all runs into a list as @Gregor suggested, you can use this function below and the lapply function.

my_func <- function(dataset, peth, pair){
  require(data.table)
  names <- names(dataset)
  setDT(dataset)[, `:=` (ADens = (get(names[1])/(get(names[1])-get(names[4])))*(peth-pair)+pair,
                         BDens = (get(names[2])/(get(names[2])-get(names[5])))*(peth-pair)+pair,
                         CDens = (get(names[3])/(get(names[3])-get(names[6])))*(peth-pair)+pair)
                 ][,  .(ADens_mean = mean(ADens),
                           ADens_sd = sd(ADens),
                           AErr =     (sqrt((sd(get(names[1]))/mean(get(names[1])))^2) + 
                                     (sqrt((sd(get(names[1]))^2 + sd(get(names[4]))^2))/
                                        (mean(get(names[1])) - mean(get(names[4]))))^2)* mean(ADens),
                           BDens_mean = mean(BDens),
                           BDens_sd = sd(BDens),
                           BErr = (sqrt((sd(get(names[2]))/mean(get(names[2])))^2) + 
                                     (sqrt((sd(get(names[2]))^2 + sd(get(names[5]))^2))/
                                        (mean(get(names[2])) - mean(get(names[5]))))^2)* mean(BDens),
                           CDens_mean = mean(CDens),
                           CDens_sd = sd(CDens),
                           CErr = (sqrt((sd(get(names[3]))/mean(get(names[3])))^2) + 
                                     (sqrt((sd(get(names[3]))^2 + sd(get(names[6]))^2))/
                                        (mean(get(names[3])) - mean(get(names[6]))))^2)* mean(CDens))
                   ]
}

rbindlist(lapply(list_datasets, my_func, peth = 2, pair = 1))

Now, this assumes that you put your raw vectors into data frames with the columns in the order in which they appeared in your example (and that they are the only columns in the data set). If this is not the case, you may just have to edit the indices in the names[x] calls. If you wanted to have a little more flexibility, you could also define a list of list with the column names for each data set in your individual raw data sets, add that as an argument to my_func and then replace all the instances of names[x] with get(list_column_names[x])

This function should output a data.table with the results for each set of data sets (1-16) in individual rows with 6 columns (ADens_mean, ADens_sd, ...)

NOTE since there was no actual data to work with, I can't say for sure that this function does exactly what you want, but I think it will be close. This will also require you to download the data.table package.

tbradley
  • 2,070
  • 8
  • 19
  • I think you are very close, i'll try what you wrote and what Gregor suggested out in R. I might return to you in case of failure... – Chemistry101 Mar 20 '17 at 08:07
  • @Chemistry101 check out the edited function, this will give you all of the calculations for each data frame in your list as a single row in the output `data.table`. It is a little more cumbersome to look at, but it should get the job done and run pretty quickly – tbradley Mar 20 '17 at 12:51
  • @Chemistry101 to clarify my last comment (which when re read, seemed it could be confusing) each data frame in the input list will correspond with a single row in the output `data.table`, i.e. if you have 16 data frames then your output will be 16 rows – tbradley Mar 20 '17 at 13:34
  • Thanks a lot, an output of 16 rows for my 16 data frames is exactly what i need. I am busy at the moment, but will try it out later and hopefully succeed! – Chemistry101 Mar 20 '17 at 14:25