I have thousands of comma separated .txt files with two columns, where one column has "wavelength" for the column name and the same wavelength values ("x" values) for all files, and the other column has the file name as the column name and response values (various observed "y" values).
If I read in a single file by readr, the format appears like this:
# A tibble: 2,151 x 2
Wavelength a1lm_00000.asd.ref.sco.txt ### [filename]
<dbl> <dbl>
1 350 0.0542
2 351 0.0661
3 352 0.0686
4 353 0.0608
5 354 0.0545
6 355 0.0589
7 356 0.0644
8 357 0.0587
9 358 0.0556
10 359 0.0519
...etc.
The end format I need is:
Filename "350" "351" "352" "353" etc.
a1lm_00000.asd.ref.sco.txt 0.0542 0.0661 0.0686 0.0608 etc.
a1lm_00001.asd.ref.sco.txt 0.0567 0.0680 0.0704 0.0627 etc.
...etc.
In other words, I need the first column as the file identifier, and each following column a spectral response with the associated spectral wavelength as the column name.
So, I need to read all these files in from a directory, and either:
a.) Create a third column that is the file name, make all second columns names something like "response", apply bind_rows to all files, then use "spread" in the tidyr package.
b.) Transpose each files as soon as it is read, in such a way that the first row becomes all column names, the second row column name is inserted into a first column for row identifiers by file name, and row bind these resulting rows.
Option b. seems preferable. Either options seems like I will need to use either lapply and possibly bind_rows or bind_cols. But I'm not sure how best to do so. There are a lot of data, and a few of the methods I've used have caused my machine to run out of memory, so the more memory-efficient I can make it the better.