11

I would like to merge multiple data.frame in R using row.names, doing a full outer join. For this I was hoping to do the following:

x = as.data.frame(t(data.frame(a=10, b=13, c=14)))
y = as.data.frame(t(data.frame(a=1, b=2)))
z = as.data.frame(t(data.frame(a=3, b=4, c=3, d=11)))
res = Reduce(function(a,b) merge(a,b,by="row.names",all=T), list(x,y,z))

Warning message:
In merge.data.frame(a, b, by = "row.names", all = T) :
  column name ‘Row.names’ is duplicated in the result
> res
  Row.names Row.names V1.x V1.y V1
    1         1         a   10    1 NA
    2         2         b   13    2 NA
    3         3         c   14   NA NA
    4         a      <NA>   NA   NA  3
    5         b      <NA>   NA   NA  4
    6         c      <NA>   NA   NA  3
    7         d      <NA>   NA   NA 11

What I was hoping to get would be:

    V1 V2 V3
  a 10 1  3
  b 13 2  4
  c 14 NA 3
  d NA NA 11
Alex
  • 17,745
  • 33
  • 112
  • 182

3 Answers3

9

The following works (up to some final column renaming):

res <- Reduce(function(a,b){
        ans <- merge(a,b,by="row.names",all=T)
        row.names(ans) <- ans[,"Row.names"]
        ans[,!names(ans) %in% "Row.names"]
        }, list(x,y,z))

Indeed:

> res
  V1.x V1.y V1
a   10    1  3
b   13    2  4
c   14   NA  3
d   NA   NA 11

What happens with a row join is that a column with the original rownames is added in the answer, which in turn does not contain row names:

> merge(x,y,by="row.names",all=T)
  Row.names V1.x V1.y
1         a   10    1
2         b   13    2
3         c   14   NA

This behavior is documented in ?merge (under Value)

If the matching involved row names, an extra character column called Row.names is added at the left, and in all cases the result has ‘automatic’ row names.

When Reduce tries to merge again, it doesn't find any match unless the names are cleaned up manually.

Ryogi
  • 4,907
  • 5
  • 22
  • 43
2

For continuity, this is not a clean solution but a workaround, I transform the list argument of 'Reduce' using sapply.

Reduce(function(a,b) merge(a,b,by=0,all=T),
                      sapply(list(x,y,z),rbind))[,-c(1,2)]
   x y.x y.y
1 10   1   3
2 13   2   4
3 14  NA   3
4 NA  NA  11
Warning message:
In merge.data.frame(a, b, by = 0, all = T) :
  column name ‘Row.names’ is duplicated in the result
agstudy
  • 113,354
  • 16
  • 180
  • 244
0

For some reason I did not have much success with Reduce. given a list of data.frames (df.lst) and a list of suffixes (suff.lst) to change the names of identical columns, this is my solution (it's loop, I know it's ugly for R standards, but it works):

df.merg <- as.data.frame(df.lst[1])
colnames(df.merg)[-1] <- paste(colnames(df.merg)[-1],suff.lst[[1]],sep="")
for (i in 2:length(df.lst)) {
    df.i <- as.data.frame(df.lst[i])
    colnames(df.i)[-1] <- paste(colnames(df.i)[-1],suff.lst[[i]],sep="")
    df.merg <- merge(df.merg, df.i, by.x="",by.y="", all=T)
}
rs028
  • 1,051
  • 2
  • 8
  • 18