0

I've searched quite some time for this and can't figure out a way that doesn't seem to send me down what appears to be the wrong path. I'm trying to replicate the following SAS implementation within R.

Right now I’m trying to figure out how to transpose, without aggregation, several values based on several identifying fields.

Example starting point:

Cat1  Cat2      Cat3    Date        Occ  Dur
A00   Group1    Sub1    2015-05-09  1    30
A00   Group1    Sub1    2015-09-09  2    30
A00   Group1    Sub2    2015-06-23  1    60
B00   Group1    Sub1    2015-07-30  3    30
B00   Group1    Sub2    2015-03-25  1    60
B00   Group1    Sub2    2015-02-14  2    60

And i'm looking to get the following output:

Cat1    Cat2    Cat3    Date1       Date2       Occ1    Occ2    Dur1  Dur2
A00     Group1  Sub1    2015-05-09  2015-09-09  1       2         30    30
A00     Group1  Sub2    2015-06-23              1                 60      
B00     Group1  Sub1    2015-07-30              3                 30      
B00     Group1  Sub2    2015-03-25  2015-02-14  1       2         60    60

I realize that different environments may require different approaches. I'm certainly open for alternative solutions than directly trying to replicate the logic within SAS. I've tried various attempts at reshaping the data with melt and cast without any luck. Any assistance would be hugely appreciated!

IRTFM
  • 240,863
  • 19
  • 328
  • 451
MKK
  • 3
  • 1
  • Would R better work with data following SAS implementation? –  Jan 14 '16 at 23:19
  • 2
    This might help you: http://stackoverflow.com/questions/30592094/r-spreading-multiple-columns-with-tidyr – Gopala Jan 14 '16 at 23:32
  • Try searching here for `[r] long wide reshape` - something like - `reshape(transform(dat,Occtmp=Occ), idvar=c("Cat1","Cat2","Cat3"), direction="wide", timevar="Occtmp", sep="")` should do it, but have a read of other questions to get your head around this logic. – thelatemail Jan 15 '16 at 00:02
  • @MKK: You should check the edit I made. The values of 3 and 4 in the original made no sense. You should also replace any tab characters with appropriate number of spaces since SO only gives them 2 spaces and your displayed values made no sense after the "compression". – IRTFM Jan 15 '16 at 00:18

2 Answers2

1

Here is a data.table based solution that mimics the logic pretty closely:

library(data.table)
library(reshape2)

DT <- fread("Cat1    Cat2    Cat3    Date    Occ Dur
            A00 Group1  Sub1    2015-05-09  1   30
            A00 Group1  Sub1    2015-09-09  2   30
            A00 Group1  Sub2    2015-06-23  1   60
            B00 Group1  Sub1    2015-07-30  3   30
            B00 Group1  Sub2    2015-03-25  1   60
            B00 Group1  Sub2    2015-02-14  2   60")


DTw <- dcast(
  melt(DT, id.vars = c("Cat1", "Cat2", "Cat3"))[
    , Idx := 1:.N
    , keyby = .(Cat1,Cat2, Cat3, variable)
    ]
  , Cat1 + Cat2 + Cat3 ~ variable + Idx)

DTw

The result look like this:

  Cat1   Cat2 Cat3     Date_1     Date_2 Occ_1 Occ_2 Dur_1 Dur_2
1  A00 Group1 Sub1 2015-05-09 2015-09-09     1     2    30    30
2  A00 Group1 Sub2 2015-06-23       <NA>     1  <NA>    60  <NA>
3  B00 Group1 Sub1 2015-07-30       <NA>     3  <NA>    30  <NA>
4  B00 Group1 Sub2 2015-03-25 2015-02-14     1     2    60    60
Alexander Radev
  • 642
  • 5
  • 11
  • 1
    Thank you Alexander! This looks like exactly what I needed. I appreciate your help! – MKK Jan 15 '16 at 03:37
1

Here is a dpyr and tidyr solution. There may be a way to do this more cleanly, but it works. It does produce a warning about id() is deprecated and I am not sure how to get rid of it

library(dplyr)
library(tidyr)

df %>%
   gather(key, value, -c(Cat1:Cat3)) %>%  ## Put in long format
   group_by(Cat1, Cat2, Cat3, key)   %>%  ## Group for numbering (1,2)
   mutate(rn = row_number())         %>%  ## Add row numbers to unite with key column
   unite(new_key, key, rn, sep = '') %>%  ## Make new unique key to be col name
   spread(new_key, value, fill = '') %>%  ## Put in 'wide' format
   select(Cat1, Cat2, Cat3, Date1, Date2, Occ1, Occ2, Dur1, Dur2)  # re-order columns

Results

  Cat1   Cat2 Cat3      Date1      Date2 Occ1 Occ2 Dur1 Dur2
1  A00 Group1 Sub1 2015-05-09 2015-09-09    1    2   30   30
2  A00 Group1 Sub2 2015-06-23               1        60     
3  B00 Group1 Sub1 2015-07-30               3        30     
4  B00 Group1 Sub2 2015-03-25 2015-02-14    1    2   60   60
steveb
  • 4,686
  • 2
  • 24
  • 32