0

I would like to select specific elements of a data.list after processing it.

To get process parameters I describe the my problem in the reproducible example. In the example code below, I have three sets of data.list each have 5 column.

Each data.list repeat theirselves three times each and each data.list assignet to unique number called set_nbr which defines these datasets.

#to create reproducible data (this part creates three sets of data each one repeats 3 times of those of Mx, My and Mz values along with set_nbr)
set.seed(1) 
data.list <- lapply(1:3, function(x) {
nrep <- 3
time <- rep(seq(90,54000,length.out=600),times=nrep) 
Mx <- c(replicate(nrep,sort(runif(600,-0.014,0.012),decreasing=TRUE)))
My <- c(replicate(nrep,sort(runif(600,-0.02,0.02),decreasing=TRUE)))
Mz <- c(replicate(nrep,sort(runif(600,-1,1),decreasing=TRUE)))
df <- data.frame(time,Mx,My,Mz,set_nbr=x)
})

after applying some function I have output like this.

 result

       time     Mz           set_nbr
 1  27810 -1.917835e-03       1
 2  28980 -1.344288e-03       1
 3  28350 -3.426615e-05       1
 4  27900 -9.934413e-04       1
 5  25560 -1.016492e-02       2
 6  27360 -4.790767e-03       2
 7  28080 -7.062256e-04       2
 8  26550 -1.171716e-04       2
 9  26820 -2.495893e-03       3
 10 26550 -7.397865e-03       3
 11 26550 -2.574022e-03       3
 12 27990 -1.575412e-02       3  

My questions starts from here.

1) How to get min,middle and max values of time column, for each set_nbr ?

2) How to use evaluated set_nbr and Mz values inside of data.list?

In short;

After deciding the min,middle and max values from time column and corresponding Mz values for each set_nbr in result, I want to return back to original data.list and extract those columns of Mx, My, Mz according those of set_nbr and Mz values. Since each set_nbr actually corresponding to 600 rows, I would like to extract those defined set_nbrs family from data.list

we use time as a factor to select set_nbr. Here factor means as extraction parameter not the real factor in R command.

In addition, as you will see four set_nbr exist for each dataset but they are indeed addressing different dataset in the data.list

Alexander
  • 3,691
  • 5
  • 30
  • 66
  • 1
    there's a lot of code that doesn't seem essential to the question. Could you try to formulate a minimal example, isolating the core of your question? – baptiste Jun 07 '15 at 05:12
  • @baptiste Actually `myfun` is needed because you cannot reach the condition without that function. Besides I put comment for each part what's happening inside of it. – Alexander Jun 07 '15 at 05:26
  • Then you should narrow down your problem. It seems like you have one big question that is really 3 small questions. "I want to return back to original data.list and select Mx, My, Mz values according those of set_nbr of values." That sounds like Question 1. Maybe make a minimal example of *just that*, then when that part is solved, move on from there. – Gregor Thomas Jun 09 '15 at 04:45
  • @Gregor thanks for advice. I revised my question. please check. – Alexander Jun 09 '15 at 14:04
  • This is still at least two questions. Graphing the series has nothing to do with extracting the min and max. I will answer the first part. **You should delete the graphing bit from this question and ask it as a separate, new question if you still need help**. – Gregor Thomas Jun 09 '15 at 15:23

1 Answers1

1

I'm a big advocate of using lists of data frames when appropriate, but in this case it doesn't look like there's any reason to keep them separated as different list items. Let's combine them into a single data frame.

library(dplyr)
dat = bind_rows(data.list)

Then getting your summary stats is easy:

dat %>% group_by(set_nbr) %>%
    summarize(min_time = min(time),
              max_time = max(time),
              middle_time = median(time))

# Source: local data frame [3 x 4]
#
#   set_nbr min_time max_time middle_time
# 1       1       90    54000       27045
# 2       2       90    54000       27045
# 3       3       90    54000       27045

In your sample data, time is defined the same way each time, so of course the min, median, and max are all the same.

I'd suggest, in the new question you ask about plotting, starting with the combined data frame dat.

As to your second question:

2) How to select evaluated set_nbr values inside of data.list?

Selecting a single item from a list, use double brackets

data.list[[2]]

However, with the combined data, it's just a normal column of a normal data frame so any of these will work:

dat[dat$set_nbr == 2, ]
subset(dat, set_nbr == 2)
filter(dat, set_nbr == 2)

To your clarification in comments, if you want the Mx and My values for the time and set_nbr in the results object, using my combined dat above, simply do a join: left_join(results, dat).

This should work, but I'm a little confused because in your simulated data time is numeric, but in your new text you say "we use time as a factor". If you've converted time to a factor object, this will only work if it has the same levels in each of the data frames in your data list. If not, I would recommend keeping time as numeric.

Community
  • 1
  • 1
Gregor Thomas
  • 104,719
  • 16
  • 140
  • 257
  • Thank you for your answer. I think I might mixed up some things and not explained the question unique points clearly. You started directly with data.list actually its the main data. We should start with result and select those rows and use them as an extraction parameter from data.list. I revised the question little bit. – – Alexander Jun 10 '15 at 14:05
  • @aoronbarlow don't change your question once it's been answered. Ask a new question if your question changes. – Gregor Thomas Jun 10 '15 at 17:23
  • Your answer is a bit unsufficient because you are not checking the condition of `result` result and use them in `data.list` to extract the related `set_nbr` datasets. – Alexander Jun 10 '15 at 22:58
  • @aoronbarlow added a couple paragraphs at the end. – Gregor Thomas Jun 10 '15 at 23:20
  • thanks I checked your comments. On the other hand, I really don`t know even I try to be very clear about my questions why my question still remain hard to catch. Anyway adding `Mx` and `My` columns to the left of `result` is not the thing what I am looking for. – Alexander Jun 11 '15 at 01:19
  • I am looking for each unique set_nbr from `result` thats understood I believe. Then, these each `set_nbr` corresponds to 600 column in data.list right? For instance, in the case of `min time` value together with that `Mz`, we have one `set_nbr` lets say 1. I want to extract that `set_nbr` dataset in the data.list with all of `Mx`, `My` and `Mz`. – Alexander Jun 11 '15 at 01:42
  • I don't understand at all. Can you make your example **minimal**, that is, not 600 rows but maybe 12 rows? And then you can show both your expected input (maybe just `result`?) **and** your expected output? I have no idea what end result you are expecting anymore. – Gregor Thomas Jun 11 '15 at 04:06
  • I am confused because your data.list has three values of `set_nbr`, 1, 2 and 3. Your `result` has the same three values of `set_nbr`. And you say "I want to extract that `set_nbr` dataset with all of Mx, My and Mz". If `set_nbr` is 1, then this is `subset(dat, set_nbr == 1)` or `data.list[[1]]`. – Gregor Thomas Jun 11 '15 at 04:07
  • But let me also say that this question has gone in circles long enough. If you can make a *small, clear, minimal example*, with expected input and output, please ask it as a new question. – Gregor Thomas Jun 11 '15 at 04:08
  • Ok , thanks for advice I`ll make it short and more understandable. – Alexander Jun 11 '15 at 04:12