Split data.frame into groups by column name

Question

I'm new to R. I have a data frame with column names of such type:

file_001   file_002   block_001   block_002   red_001   red_002 ....etc'  
  0.05       0.2        0.4         0.006       0.05       0.3
  0.01       0.87       0.56        0.4         0.12       0.06

I want to split them into groups by the column name, to get a result like this:

group_file
file_001   file_002
  0.05       0.2
  0.01       0.87

group_block
block_001   block_002
  0.4        0.006
  0.56       0.4

group_red
red_001    red_002
  0.05       0.3
  0.12       0.06

...etc'

My file is huge. I don't have a certain number of groups. It needs to be just by the column name's start.

Relevant: [What is the algorithm behind R core's `split` function?](https://stackoverflow.com/q/52158589/4891738) — 李哲源, Sep 04 '18 at 13:12

lmo · Answer 1 · 2017-11-14T15:36:24.927

In base R, you can use sub and split.default like this to return a list of data.frames:

myDfList <- split.default(dat, sub("_\\d+", "", names(dat)))

this returns

myDfList
$block
  block_001 block_002
1      0.40     0.006
2      0.56     0.400

$file
  file_001 file_002
1     0.05     0.20
2     0.01     0.87

$red
  red_001 red_002
1    0.05    0.30
2    0.12    0.06

split.default will split data.frames by variable according to its second argument. Here, we use sub and the regular expression "_\d+" to remove the underscore and all numeric values following it in order to return the splitting values "block", "file", and "red".

As a side note, it is typically a good idea to keep these data.frames in a list and work with them through functions like lapply. See gregor's answer to this post for some motivating examples.

the results I got are: '$file_001 file_001 0.05 0.01 $file_002 file_002 0.2 0.87' — Keity, Nov 15 '17 at 15:25

score 0 · Answer 2 · answered Nov 19 '17 at 10:29

Thank you lmo, after using your code, it didn't work as I wanted, but I came with a solution thanks to your guidance.

So, in order to divide a Data Frame list:

myDfList <- split.default(dat, sub(x = as.character(names(dat)), pattern = "\\_.*", ""))

hope it'll help people in the future!

Split data.frame into groups by column name

2 Answers2

Linked

Related