Creating new columns with mutate

Question

i can figure out the solution of my problem but in a very not optimal way and thus the solution i have is not adapted for a large df. Let me explain.

I have a big dataframe and i need to create new columns by subtracting two others ones. Let me show you using a simple df.

A<-rnorm(10)
B<-rnorm(10)
C<-rnorm(10)
D<-rnorm(10)
E<-rnorm(10)
F<-rnorm(10)
df1<-data_frame(A,B,C,D,E,F) 
# A tibble: 10 x 6
        A          B          C          D          E           F
    <dbl>      <dbl>      <dbl>      <dbl>      <dbl>       <dbl>
 1 -2.8750025  0.4685855  2.4435767  1.6999761 -1.3848386 -0.58992249
 2  0.2551404  1.8555876  0.8365116 -1.6151186 -1.7754623  0.04423463
 3  0.7740396 -1.0756147  0.6830024 -2.3879337 -1.3165875 -1.36646493
 4  0.2059932  0.9322016  1.2483196 -0.1787840  0.3546773 -0.12874831
 5 -0.4561725 -0.1464692 -0.7112905  0.2791592  0.5835127  0.16493237
 6  1.2401795 -1.1422917 -0.6189480 -1.4975416  0.5653565 -1.32575021
 7 -1.6173618  0.2283430  0.6154920  0.6082847  0.0273447  0.16771783
 8  0.3340799 -0.5096500 -0.5270123 -0.2814217 -2.3732234  0.27972188
 9 -0.4841361  0.1651265  0.0296500  0.4324903 -0.3895971 -2.90426195
10 -2.7106357  0.5496335  0.3081533 -0.3083264 -0.1341055 -0.17927807

I need (i) to subtract two columns at a similar distance : D-A, E-B, F-C while (ii) giving the new column a name based on the name of the initial variables' names.

I did in that way and it works:

df2<-df1 %>% 
  transmute (!!paste0("diff","D","A") := D-A,
          !!paste0("diff","E","B") := E-B,
          !!paste0("diff","F","C") := F-C)


# A tibble: 10 x 3
   diffDA     diffEB     diffFC
    <dbl>      <dbl>      <dbl>
 1  4.5749785 -1.8534241 -3.0334991
 2 -1.8702591 -3.6310500 -0.7922769
 3 -3.1619734 -0.2409728 -2.0494674
 4 -0.3847772 -0.5775242 -1.3770679
 5  0.7353317  0.7299819  0.8762229
 6 -2.7377211  1.7076482 -0.7068022
 7  2.2256465 -0.2009983 -0.4477741
 8 -0.6155016 -1.8635734  0.8067342
 9  0.9166264 -0.5547236 -2.9339120
10  2.4023093 -0.6837390 -0.4874314

However, i have many columns and i would like to find a way to make the code simpler. I tried many things (like with mutate_all, mutate_at or add_columns) but nothing works...

Is there a consistent pattern to the columns that you want to subtract? — Randall Helms, Oct 11 '18 at 08:16
I am not sure to understand clearly what do you mean by consistent pattern; you mean a consistent lag? if it is that, in that case, yes it is. — F. Lyon, Oct 11 '18 at 08:53
By consistent pattern, I mean do you go in groups of three through the rest of the data set, as shown above? So you have diffDA, diffEB, diffFC, what would be next? Is is diffGD? — Randall Helms, Oct 11 '18 at 09:03

score 1 · Accepted Answer · answered Oct 11 '18 at 11:57

OK, here's a method that will work for the full width of your data set.

df1 <- tibble(A = rnorm(10),
        B = rnorm(10),
        C = rnorm(10),
        D = rnorm(10),
        E = rnorm(10),
        F = rnorm(10),
        G = rnorm(10),
        H = rnorm(10),
        I = rnorm(10))

ct <- 1:ncol(df1)

diff_tbl <- tibble(testcol = rnorm(10)) 

for (i in ct) {

  new_tbl <- tibble(col = df1[[i+3]] - df1[[i]])
  names(new_tbl)[1] <- paste('diff',colnames(df1[i+3]),colnames(df1[i]),sep='')

  diff_tbl <- bind_cols(diff_tbl,new_tbl)

}

diff_tbl <- diff_tbl %>%
  select(-testcol)

df1 <- bind_cols(df1,diff_tbl)

Basically, what you are doing is creating a second dummy tibble to compute the differences, iterating over the possible differences (i.e. gaps of three columns) then assembling them into a single tibble, then binding those columns to the original tibble. As you can see, I extended df1 by three extra columns and the whole thing worked like a charm.

It's probable that there's a more elegant way to do this, but this method definitely works. There's one slightly awkward thing in that I had to create the diff_tbl with a dummy column and then remove it before the final bind_cols() call, but it's not a major thing, I think.

That's a perfect solution because indeed my df has the pattern you described it above. — F. Lyon, Oct 11 '18 at 12:18

score 0 · Answer 2 · answered Oct 11 '18 at 07:50

0

You could divide the data frame in two parts and do

inds <- ncol(df1)/2

df1[paste0("diff", names(df1[(inds + 1):ncol(df1)]), names(df1[1:inds]))] <- 
           df1[(inds + 1):ncol(df1)] - df1[1:inds]

answered Oct 11 '18 at 07:50

Ronak Shah

286,338
16
97
143

Gregor Thomas · Answer 3 · 2018-10-11T08:09:05.587

Note that column names with dashes in them are improper and not recommended.

result = df1[4:6] - df1[1:3]
names(result) = paste(names(df1)[4:6], names(df1)[1:3], sep = "-")
result
#            D-A         E-B        F-C
# 1   0.12459065  0.05855622  0.6134559
# 2  -2.65583389  0.26425762  0.8344115
# 3  -1.48761765 -3.13999402  1.3008065
# 4  -4.37469763  1.37551178  1.3405191
# 5   1.01657135 -0.90690359  1.5848562
# 6  -0.34050959 -0.57687686 -0.3794937
# 7   0.85233808  0.57911293 -0.8896393
# 8   0.01931559  0.91385740  3.2685647
# 9  -0.62012982 -2.34166712 -0.4001903
# 10 -2.21764146  0.05927664  0.3965072

Creating new columns with mutate

3 Answers3