2

When rbinding multiple dataframes, I'd like to indicate where the former dataframes started. So when using:

df1<-data.frame(c(1,2,3,4),rnorm(1:4),rnorm(1:4),rnorm(1:4))
df2<-data.frame(c(1,2,3,4),rnorm(1:4),rnorm(1:4),rnorm(1:4))
dfTotal<-rbind(df1,df2)

I'd like to have an indicator where df2 has started in dfTotal.


Two questions:

  1. Can this be done?
  2. Is there a better way to have the first column just go from 1 to 8?
Tobias van Elferen
  • 379
  • 1
  • 2
  • 11

4 Answers4

3

We can use rbindlist with idcol argument

library(data.table)
rbindlist(list(df1,df2), idcol='grp')

If there are multiple datasets with pattern 'df' followed by numbers, we can use mget with paste to get all the datasets in a `list

rbindlist(mget(paste0("df", 1:2)), idcol = "grp")

Or use bind_rows from dplyr

library(dplyr)
bind_rows(df1, df2, .id='grp')

Or we can use base R in a compact way

do.call(rbind, Map(cbind, ind = 1:2, mget(paste0("df", 1:2))))
akrun
  • 674,427
  • 24
  • 381
  • 486
2

How about this one using base R functions:

cbind(indicator=c(rep("df1", nrow(df1)), rep("df2", nrow(df2))) ,dfTotal<-rbind(df1,df2))

would give you:

  indicator c.1..2..3..4.  rnorm.1.4. rnorm.1.4..1 rnorm.1.4..2
1       df1             1 -0.50219235    0.1169713  -0.82525943
2       df1             2  0.13153117    0.3186301  -0.35986213
3       df1             3 -0.07891709   -0.5817907   0.08988614
4       df1             4  0.88678481    0.7145327   0.09627446
5       df2             1 -0.20163395   -0.3888542  -0.43808998
6       df2             2  0.73984050    0.5108563   0.76406062
7       df2             3  0.12337950   -0.9138142   0.26196129
8       df2             4 -0.02931671    2.3102968   0.77340460

DATA

set.seed(100)
df1<-data.frame(c(1,2,3,4),rnorm(1:4),rnorm(1:4),rnorm(1:4))
df2<-data.frame(c(1,2,3,4),rnorm(1:4),rnorm(1:4),rnorm(1:4))
dfTotal<-rbind(df1,df2)
989
  • 11,117
  • 5
  • 24
  • 45
  • 1
    The benchmark is wrong. Please check at least if the dimension of the result is as expected! Everywhere you use `mget(ls())` inside a function, you need to fetch the values from the right environment. – Arun Jun 13 '16 at 15:45
  • @Arun, thanks for pointing this out. I removed benchmark for the time being. I will take a look later. – 989 Jun 13 '16 at 15:51
1

Simple way to get a row indicator by adding 2 variable in df1 and df2 as below

df1<-data.frame(c(1,2,3,4),rnorm(1:4),rnorm(1:4),rnorm(1:4),map="d1")
df2<-data.frame(c(1,2,3,4),rnorm(1:4),rnorm(1:4),rnorm(1:4),map="d2")
dfTotal<-rbind(df1,df2)

  c.1..2..3..4. rnorm.1.4. rnorm.1.4..1 rnorm.1.4..2 map
1             1  1.5211423  -0.05746568    0.7507580  d1
2             2 -0.5016556   0.33257341   -0.7042438  d1
3             3 -0.7154221  -0.79463908   -1.0391944  d1
4             4 -0.3255207   0.04130148   -1.4263133  d1
5             1 -1.5784721   0.58019130   -0.2091264  d2
6             2 -1.1682966  -0.17827840    1.3235675  d2
7             3  0.3025030   1.98774090    0.3537830  d2
8             4  2.5133713  -0.28664053    1.0521226  d2
Arun kumar mahesh
  • 2,080
  • 1
  • 9
  • 17
0

Here is a longer base R method that puts the data.frames into a list for manipulation:

# put the data.frames into a list
dfList <- mget(ls(pattern="df[0-9]+"))

# append the list of data.frames into a single data.frame
dfTotal <- do.call(rbind, dfList)

# get the ID from the row names
dfTotal$id <- as.integer(gsub("^df(\\d)+.*", "\\1", rownames(dfTotal)))

To see more about working with lists of data.frames, take a look at this post.

Community
  • 1
  • 1
lmo
  • 35,764
  • 9
  • 49
  • 57