0

i can figure out the solution of my problem but in a very not optimal way and thus the solution i have is not adapted for a large df. Let me explain.

I have a big dataframe and i need to create new columns by subtracting two others ones. Let me show you using a simple df.

A<-rnorm(10)
B<-rnorm(10)
C<-rnorm(10)
D<-rnorm(10)
E<-rnorm(10)
F<-rnorm(10)
df1<-data_frame(A,B,C,D,E,F) 
# A tibble: 10 x 6
        A          B          C          D          E           F
    <dbl>      <dbl>      <dbl>      <dbl>      <dbl>       <dbl>
 1 -2.8750025  0.4685855  2.4435767  1.6999761 -1.3848386 -0.58992249
 2  0.2551404  1.8555876  0.8365116 -1.6151186 -1.7754623  0.04423463
 3  0.7740396 -1.0756147  0.6830024 -2.3879337 -1.3165875 -1.36646493
 4  0.2059932  0.9322016  1.2483196 -0.1787840  0.3546773 -0.12874831
 5 -0.4561725 -0.1464692 -0.7112905  0.2791592  0.5835127  0.16493237
 6  1.2401795 -1.1422917 -0.6189480 -1.4975416  0.5653565 -1.32575021
 7 -1.6173618  0.2283430  0.6154920  0.6082847  0.0273447  0.16771783
 8  0.3340799 -0.5096500 -0.5270123 -0.2814217 -2.3732234  0.27972188
 9 -0.4841361  0.1651265  0.0296500  0.4324903 -0.3895971 -2.90426195
10 -2.7106357  0.5496335  0.3081533 -0.3083264 -0.1341055 -0.17927807

I need (i) to subtract two columns at a similar distance : D-A, E-B, F-C while (ii) giving the new column a name based on the name of the initial variables' names.

I did in that way and it works:

df2<-df1 %>% 
  transmute (!!paste0("diff","D","A") := D-A,
          !!paste0("diff","E","B") := E-B,
          !!paste0("diff","F","C") := F-C)


# A tibble: 10 x 3
   diffDA     diffEB     diffFC
    <dbl>      <dbl>      <dbl>
 1  4.5749785 -1.8534241 -3.0334991
 2 -1.8702591 -3.6310500 -0.7922769
 3 -3.1619734 -0.2409728 -2.0494674
 4 -0.3847772 -0.5775242 -1.3770679
 5  0.7353317  0.7299819  0.8762229
 6 -2.7377211  1.7076482 -0.7068022
 7  2.2256465 -0.2009983 -0.4477741
 8 -0.6155016 -1.8635734  0.8067342
 9  0.9166264 -0.5547236 -2.9339120
10  2.4023093 -0.6837390 -0.4874314

However, i have many columns and i would like to find a way to make the code simpler. I tried many things (like with mutate_all, mutate_at or add_columns) but nothing works...

F. Lyon
  • 55
  • 3
  • Is there a consistent pattern to the columns that you want to subtract? – Randall Helms Oct 11 '18 at 08:16
  • I am not sure to understand clearly what do you mean by consistent pattern; you mean a consistent lag? if it is that, in that case, yes it is. – F. Lyon Oct 11 '18 at 08:53
  • By consistent pattern, I mean do you go in groups of three through the rest of the data set, as shown above? So you have diffDA, diffEB, diffFC, what would be next? Is is diffGD? – Randall Helms Oct 11 '18 at 09:03

3 Answers3

1

OK, here's a method that will work for the full width of your data set.

df1 <- tibble(A = rnorm(10),
        B = rnorm(10),
        C = rnorm(10),
        D = rnorm(10),
        E = rnorm(10),
        F = rnorm(10),
        G = rnorm(10),
        H = rnorm(10),
        I = rnorm(10))

ct <- 1:ncol(df1)

diff_tbl <- tibble(testcol = rnorm(10)) 

for (i in ct) {

  new_tbl <- tibble(col = df1[[i+3]] - df1[[i]])
  names(new_tbl)[1] <- paste('diff',colnames(df1[i+3]),colnames(df1[i]),sep='')

  diff_tbl <- bind_cols(diff_tbl,new_tbl)

}

diff_tbl <- diff_tbl %>%
  select(-testcol)

df1 <- bind_cols(df1,diff_tbl)

Basically, what you are doing is creating a second dummy tibble to compute the differences, iterating over the possible differences (i.e. gaps of three columns) then assembling them into a single tibble, then binding those columns to the original tibble. As you can see, I extended df1 by three extra columns and the whole thing worked like a charm.

It's probable that there's a more elegant way to do this, but this method definitely works. There's one slightly awkward thing in that I had to create the diff_tbl with a dummy column and then remove it before the final bind_cols() call, but it's not a major thing, I think.

Randall Helms
  • 769
  • 3
  • 13
0

You could divide the data frame in two parts and do

inds <- ncol(df1)/2

df1[paste0("diff", names(df1[(inds + 1):ncol(df1)]), names(df1[1:inds]))] <- 
           df1[(inds + 1):ncol(df1)] - df1[1:inds]
Ronak Shah
  • 286,338
  • 16
  • 97
  • 143
0

Note that column names with dashes in them are improper and not recommended.

result = df1[4:6] - df1[1:3]
names(result) = paste(names(df1)[4:6], names(df1)[1:3], sep = "-")
result
#            D-A         E-B        F-C
# 1   0.12459065  0.05855622  0.6134559
# 2  -2.65583389  0.26425762  0.8344115
# 3  -1.48761765 -3.13999402  1.3008065
# 4  -4.37469763  1.37551178  1.3405191
# 5   1.01657135 -0.90690359  1.5848562
# 6  -0.34050959 -0.57687686 -0.3794937
# 7   0.85233808  0.57911293 -0.8896393
# 8   0.01931559  0.91385740  3.2685647
# 9  -0.62012982 -2.34166712 -0.4001903
# 10 -2.21764146  0.05927664  0.3965072
Gregor Thomas
  • 104,719
  • 16
  • 140
  • 257