0

I'm just trying to get my head around tidying my data and I have this problem:

I have data as follows:

ID  Tx1  Tx1Date   Tx1Details Tx2  Tx2Date   Tx2Details Tx3  Tx1Date   Tx1Details  
1   14   12/3/14      blabla  1e   12/5/14   morebla   r    14/2/14   grrr
2   23   14/5/16      albalb  342  1/4/5     teeee     s    5/6/17    purrr

I want the data to be in the format

ID Tx  TxDate  TxDetails
1 14   12/3/14   blabla
1 1e   12/5/14   morebla
1 r    14/2/14   grrr
2 23   14/5/16   albalb 
2 342  1/4/5     teeee
2 s    5/6/17    purrr

I have used

library(tidyr)
library(dplyr)
NewData<-mydata %>% gather(key, value, "ID", 2:10)

but I'm not sure how to rename the columns as per the intended output to see if this will work

Sebastian Zeki
  • 5,996
  • 7
  • 44
  • 98

2 Answers2

2

You can rename your data frame column names to a more conventional separable names and then use the base reshape function, assuming your initial data frames looks like this(changed the last two column names to Tx3Date and Tx3Details as otherwise they are duplicates of columns 4 and 5):

df
#  ID Tx1 Tx1Date Tx1Details Tx2 Tx2Date Tx2Details Tx3 Tx3Date Tx3Details
#1  1  14 12/3/14     blabla  1e 12/5/14    morebla   r 14/2/14       grrr
#2  2  23 14/5/16     albalb 342   1/4/5      teeee   s  5/6/17      purrr

names(df) <- gsub("(\\d)(\\w*)", "\\2\\.\\1", names(df))

df
#  ID Tx.1 TxDate.1 TxDetails.1 Tx.2 TxDate.2 TxDetails.2 Tx.3 TxDate.3 TxDetails.3
#1  1   14  12/3/14      blabla   1e  12/5/14     morebla    r  14/2/14        grrr
#2  2   23  14/5/16      albalb  342    1/4/5       teeee    s   5/6/17       purrr

reshape(df, varying = 2:10, idvar = "ID", dir = "long")

#    ID time  Tx  TxDate TxDetails
#1.1  1    1  14 12/3/14    blabla
#2.1  2    1  23 14/5/16    albalb
#1.2  1    2  1e 12/5/14   morebla
#2.2  2    2 342   1/4/5     teeee
#1.3  1    3   r 14/2/14      grrr
#2.3  2    3   s  5/6/17     purrr

Drop the redundant time variable if you don't need it.

Psidom
  • 171,477
  • 20
  • 249
  • 286
  • 1
    great code !! I had been waiting for an answer... sometimes I get confused between `dcast`, `gather`, `melt`,`reshape`. after seeing the answers, things look so easy but! – joel.wilson Dec 16 '16 at 20:17
1

The data.table package handles this pretty well.

library(data.table)
setDT(df)
melt(df, measure = list(Tx = grep("^Tx[0-3]$", names(df)), 
                        Date =  grep("Date", names(df)), 
                        Details = grep("Details", names(df))),
         value.name = c("Tx", "TxDate", "TxDetails"))

Or more concisely

melt(df, measure = patterns("^Tx[0-3]$", "Date", "Details"),
         value.name = c("Tx", "TxDate", "TxDetails"))
Ista
  • 9,106
  • 2
  • 32
  • 30