0

I've got some large data files with biometric time course data for different subjects. I've already been able to average subsets of data and export specific one number metrics, and now I am attempting to modify my code to actually save the raw time course data (i.e., subsets of data) into a data.frame for later averaging and visualization.

I have tried setting up a date.frame like this:

results = data.frame(filename = character(), SubNum = numeric(), 
APTCode = character(), Pcode = character(), 
FAAdata = logical(), FixPlus_AvgFAA = numeric(), 
FAA_pringles = t(vector(mode = 'numeric', length = 25)), 
FAA_ax = t(vector(mode = 'numeric', length = 25)), 
FAA_pin = t(vector(mode = 'numeric', length = 25)), 
FAA_inf = t(vector(mode = 'numeric', length = 25)), 
FAA_lev = t(vector(mode = 'numeric', length = 25)), 
FAA_col = t(vector(mode = 'numeric', length = 25)))

My idea is to have a data frame with several transposed numeric vectors of length 25 that can be set to zero, and then filled in with the appropriate subsets. But R doesn't like this expression, and I get the following error:

 Error in data.frame(filename = character(), SubNum = numeric(), APTCode = character(),  : 
  arguments imply differing number of rows: 0, 1

I seem to be able to get it to work for 1 vector, but not for more than one vector, i.e., this works:

try1 = data.frame(longvector = t(vector(mode = 'numeric', length = 25)))

But this doesn't work:

try2 = data.frame(longvector = t(vector(mode = 'numeric', length = 25), bigvector2 = t(vector(mode = 'numeric', length = 25))))

I get the error:

Error in t(vector(mode = "numeric", length = 25), bigvector2 = t(vector(mode = "numeric",  : 
  unused argument (bigvector2 = t(vector(mode = "numeric", length = 25)))

Basically, I need a long string of numbers to hold the extracted subset of data.

www
  • 35,154
  • 12
  • 33
  • 61
Ken
  • 1
  • I don't think you want to transpose your vectors... vectors are column vectors by default, so transposing a vector and putting it in a data frame will create one row with multiple columns. – Gregor Thomas Jul 03 '17 at 17:31
  • As for the different lengths, `numeric()` has length zero, so creates an empty column (no rows). `FAA_ax = t(vector(mode = 'numeric', length = 25))` tries to create 25 columns with 1 row (all values default to 0). Your data frame can't have both 0 rows and 1 row, hence the error. – Gregor Thomas Jul 03 '17 at 17:34
  • 1
    You can always add columns to a data frame later - I would recommend getting the data values you want and then creating the data frame last, rather than trying to initialize a 0 or 1 row data frame to (presumably) add data to one row at a time. – Gregor Thomas Jul 03 '17 at 17:36
  • Hello Gregor, thanks for your response. You are correct that I am initializing a data.frame and then adding one line at a time, or for each participant. I am actually modifying some existing code that I already have working, hence my desire to just do it this way. – Ken Jul 03 '17 at 18:15
  • I think you are right: I don't need to transpose.. I've removed that, but still get an error. Basically, I do want a data.frame with character and other types, as well as vectors of length 25, e.g., [0, 0, 0, 0... 0]. These vectors are meant to be used as containers for time course information that I will want to later average across subjects and plot... but I am having trouble creating a data frame that includes vectors. – Ken Jul 03 '17 at 18:18
  • You just need the lengths to match. `numeric()`, `character()`, etc., have a length argument that defaults to 0. You seem to want 25 rows, so use `filename = character(25)` or `filename = rep(NA_character_, 25)` instead of `character()`. – Gregor Thomas Jul 03 '17 at 18:21
  • And do it for everything, `numeric(25)` is the same as `vector(mode = 'numeric', length = 25)`. But `numeric(25)` is shorter and clearer. – Gregor Thomas Jul 03 '17 at 18:22
  • Hi all, I think I might have a way to make this work... when I take a look at how the vectors are created, I think I do need to do a transpose... basically, I want to capture a bunch of time course data as a long series of number. I appear to need the transpose to do that (so the numbers are all in one row, and not in multiple rows). However, it looks like I can use the same format to create character columns, so: – Ken Jul 03 '17 at 18:28
  • esults3 = data.frame(filename = t(vector("character", length = 1)), FAA_prin = t(vector("numeric", length = 25)), FAA_ax = t(vector("numeric", length = 25)), FAA_pine = t(vector("numeric", length = 25)), FAA_inf = t(vector("numeric", length = 25)), FAA_lev = t(vector("numeric", length = 25)), FAA_col = t(vector("numeric", length = 25))) – Ken Jul 03 '17 at 18:28
  • Okay, I think this might be the solution. Thanks for your help Gregor... results = data.frame(filename = character(1), SubNum = numeric(1), APTCode = character(1), Pcode = character(1), FAAdata = logical(1), FixPlus_AvgFAA = numeric(1), FAA_prin = t(numeric(25)), FAA_ax = t(numeric(25)), FAA_pine = t(numeric(25)), FAA_inf = t(numeric(25)), FAA_lev = t(numeric(25)), FAA_col = t(numeric(25))) – Ken Jul 03 '17 at 18:35
  • Okay glad to hear it's working. It looks terribly ugly to me with all the transposed vectors. Also [growing objects in a loop (like adding rows to a data frame one at a time, is the 2nd Circle of R Hell](http://www.burns-stat.com/pages/Tutor/R_inferno.pdf) - it's terribly inefficient. – Gregor Thomas Jul 03 '17 at 18:44
  • Gregor... I am making progress. Thx again for your help. And, I am curious, what would be a better control structure, given that I need to work through a list of 100 participants with separate data files for each one, including missing data files? I do want to learn to be a better R programmer, and right now I feel like I am kludging my way through. – Ken Jul 03 '17 at 20:02
  • It's hard to know without seeing some sample data. If the data all fits in memory, then it probably makes sense [to use a list of data frames](https://stackoverflow.com/a/24376207/903061). If each data frame has the same structure then you can combine them into a single data frame with a `participant` column to tell the rows apart, and then use grouped operations with either `data.table` or `dplyr` to do all your calculating at once. – Gregor Thomas Jul 03 '17 at 20:07

0 Answers0