1

I have a big database that I am working with about 1500 rows. I have a data frame in wide format but I want it to be in a long format but the columns need to be stacked as rows. I have tried using reshape package which helped but I need to get the columns into rows and label each row but the inbred name. I have a data frame ready to go...I have two data frame df is the starting point. df.actual is the data frame I am trying to get. I used the reshape function to stack the inbreds by column by the columns that need to be stacked again like the inbreds were.

 df<-data.frame(inbred=c("x1","x2","x3","x4"),trait1=c(12,43,23,65),obs1=c(2,2,3,2),std.error1=c(.2,.3,.2,.4)
                ,trait2=c(14,53,63,68),obs2=c(3,3,2,2),std.error2=c(.1,.4,.5,.6))
 df
  inbred trait1 obs1 std.error1 trait2 obs2 std.error2
1     x1     12    2        0.2     14    3        0.1
2     x2     43    2        0.3     53    3        0.4
3     x3     23    3        0.2     63    2        0.5
4     x4     65    2        0.4     68    2        0.6
> 

I've tried getting the dataframe to reshape into wide but the columns weren't stacked.

 reshape(df, varying= c("trait1","obs1","std.error1","trait2","obs2","std.error2"),
         idvar = "inbred", direction="long", sep="")
     inbred time trait obs std.error
x1.1     x1    1    12   2       0.2
x2.1     x2    1    43   2       0.3
x3.1     x3    1    23   3       0.2
x4.1     x4    1    65   2       0.4
x1.2     x1    2    14   3       0.1
x2.2     x2    2    53   3       0.4
x3.2     x3    2    63   2       0.5
x4.2     x4    2    68   2       0.6
> 

I am having trouble getting the dataframe to stack the rows and label these rows by there inbred name.

 df.actual = data.frame(inbred=c("x1","x1","x1","x2","x2","x2","x3","x3","x3","x4","x4","x4"),metric=c("data.entry","obs","std.error","data.entry","obs","std.error","data.entry","obs","std.error","data.entry","obs","std.error"),trait1=c(12,2,.2,43,2,.3,23,3,.2,65,2,.4),
                                 trait2=c(14,3,.1,53,3,.4,63,2,.5,68,2,.6))
 df.actual
   inbred     metric trait1 trait2
1      x1 data.entry   12.0   14.0
2      x1        obs    2.0    3.0
3      x1  std.error    0.2    0.1
4      x2 data.entry   43.0   53.0
5      x2        obs    2.0    3.0
6      x2  std.error    0.3    0.4
7      x3 data.entry   23.0   63.0
8      x3        obs    3.0    2.0
9      x3  std.error    0.2    0.5
10     x4 data.entry   65.0   68.0
11     x4        obs    2.0    2.0
12     x4  std.error    0.4    0.6
FrosyFeet456
  • 177
  • 11

2 Answers2

1

Here is an option with pivot_longer where we can get both the formats (second format) by interchanging the positions of names_to.

library(dplyr)
library(tidyr) 
library(stringr)
df %>%
  pivot_longer(cols = -inbred, names_to = c("metric", ".value"),
    names_sep="(?<=[a-z])(?=\\d$)", 
     names_repair = ~ str_replace(., "^(\\d+)", "trait\\1")) %>%
  mutate(metric = replace(metric, metric == "trait", "data.entry"))
# A tibble: 12 x 4
#   inbred metric     trait1 trait2
#   <fct>  <chr>       <dbl>  <dbl>
# 1 x1     data.entry   12     14  
# 2 x1     obs           2      3  
# 3 x1     std.error     0.2    0.1
# 4 x2     data.entry   43     53  
# 5 x2     obs           2      3  
# 6 x2     std.error     0.3    0.4
# 7 x3     data.entry   23     63  
# 8 x3     obs           3      2  
# 9 x3     std.error     0.2    0.5
#10 x4     data.entry   65     68  
#11 x4     obs           2      2  
#12 x4     std.error     0.4    0.6

If we need the reshape output

df %>%
  pivot_longer(cols = -inbred, names_to = c(".value", "metric"),
    names_sep="(?<=[a-z])(?=\\d$)")
# A tibble: 8 x 5
#  inbred metric trait   obs std.error
#  <fct>  <chr>  <dbl> <dbl>     <dbl>
#1 x1     1         12     2       0.2
#2 x1     2         14     3       0.1
#3 x2     1         43     2       0.3
#4 x2     2         53     3       0.4
#5 x3     1         23     3       0.2
#6 x3     2         63     2       0.5
#7 x4     1         65     2       0.4
#8 x4     2         68     2       0.6
akrun
  • 674,427
  • 24
  • 381
  • 486
1

You were almost there! The only thing that was missing was that there are two "trait"s in the data that you need to consider. The solution is to put the column groups into a list, which you can do by name or by number. I'll use numbers here.

To construct the desired metrics variable in the same step, use 1. timevar which gives the time variable a name, and 2. times which assigns names to the times (yielding type character though).

reshape(df, list(2:4, 5:7), idvar="inbred", direction="long", 
        timevar="metric", times=c("trait", "obs", "std.error"))
#              inbred    metric trait1 trait2
# x1.trait         x1     trait   12.0   14.0
# x2.trait         x2     trait   43.0   53.0
# x3.trait         x3     trait   23.0   63.0
# x4.trait         x4     trait   65.0   68.0
# x1.obs           x1       obs    2.0    3.0
# x2.obs           x2       obs    2.0    3.0
# x3.obs           x3       obs    3.0    2.0
# x4.obs           x4       obs    2.0    2.0
# x1.std.error     x1 std.error    0.2    0.1
# x2.std.error     x2 std.error    0.3    0.4
# x3.std.error     x3 std.error    0.2    0.5
# x4.std.error     x4 std.error    0.4    0.6

After that you could use order() to arrange the rows if needed.

jay.sf
  • 33,483
  • 5
  • 39
  • 75