Why assign() is behaving oddly in for() loop with dplyr pipes in R?

Question

I need to loop different functions in dataframes allocated in my Global Environment and save the output of each "run" of the loop in a new dataframe that includes the initial name. For this end, I'm using assign() with for() loop. It works well, except if I use the dplyr pipe %>%. The function itself works, but there is some error with the name assigned to the output dataframe. How can I fix this issue with %>% ? If not possible to fix, can I change assign() for another function?

This works well:

code1:
for(i in unique(table$V1)){ 
    assign(paste0(i, "_target"),table[grepl(i,table$V1),])
  }

Explanation: Selects unique entries in column 1 of the "table" and subset the rows with these entries to a new dataframe per entry. Output: the new dataframe name is "entry name" + "_target"

This doesn't work well (and I would like to know why):

code2:
for(i in mget(ls(pattern = "_target"))){
    assign(paste0(i, "_slim"),data.frame(i %>% group_by(Sample.Name) %>% summarise(Mean_dC=mean(C__))))
  }

Explanation: Selects all dataframes in the Global Env that name contains "_target". In each dataframe: it does the mean of the values "(C__)" associated to entries with same characters "(Sample.Name)". Should be output: the new dataframe name is "entry name_target" + "_slim". Real output: the new dataframe presents the mean of the same characters, but is named "c(aleatory numbers)_slim".

code2 input:

STA_target <- structure(list(Well = structure(c(8L, 9L, 10L, 21L, 22L, 23L, 
33L, 34L, 35L, 46L, 47L, 48L, 58L, 59L, 60L, 73L, 74L, 75L, 85L, 
86L, 87L, 97L, 98L, 99L), .Label = c("", "A1", "A10", "A11", 
"A12", "A2", "A3", "A4", "A5", "A6", "A7", "A8", "A9", "Analysis Type", 
"B1", "B10", "B11", "B12", "B2", "B3", "B4", "B5", "B6", "B7", 
"B8", "B9", "C1", "C10", "C11", "C12", "C2", "C3", "C4", "C5", 
"C6", "C7", "C8", "C9", "Chemistry", "D1", "D10", "D11", "D12", 
"D2", "D3", "D4", "D5", "D6", "D7", "D8", "D9", "E1", "E10", 
"E11", "E12", "E2", "E3", "E4", "E5", "E6", "E7", "E8", "E9", 
"Endogenous Control", "Experiment File Name", "Experiment Run End Time", 
"F1", "F10", "F11", "F12", "F2", "F3", "F4", "F5", "F6", "F7", 
"F8", "F9", "G1", "G10", "G11", "G12", "G2", "G3", "G4", "G5", 
"G6", "G7", "G8", "G9", "H1", "H10", "H11", "H12", "H2", "H3", 
"H4", "H5", "H6", "H7", "H8", "H9", "Instrument Type", "Passive Reference", 
"Reference Sample", "RQ Min/Max Confidence Level", "Well"), class = "factor"), 
    Sample.Name = c("Control_in", "Control_in", "Control_in", 
    "Sample2_in", "Sample2_in", "Sample2_in", "Sample5_in", "Sample5_in", 
    "Sample5_in", "Sample3_in", "Sample3_in", "Sample3_in", "Control_c", 
    "Control_c", "Control_c", "Sample2_c", "Sample2_c", "Sample2_c", 
    "Sample3_c", "Sample3_c", "Sample3_c", "Sample5_c", "Sample5_c", 
    "Sample5_c"), Target.Name = c("STA", "STA", "STA", "STA", 
    "STA", "STA", "STA", "STA", "STA", "STA", "STA", "STA", "STA", 
    "STA", "STA", "STA", "STA", "STA", "STA", "STA", "STA", "STA", 
    "STA", "STA"), Task = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 
    3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
    3L, 3L, 3L), .Label = c("", "Task", "UNKNOWN"), class = "factor"), 
    Reporter = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
    3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L
    ), .Label = c("", "Reporter", "SYBR"), class = "factor"), 
    Quencher = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L
    ), .Label = c("", "None", "Quencher"), class = "factor"), 
    RQ = structure(c(12L, 12L, 12L, 8L, 8L, 8L, 6L, 6L, 6L, 11L, 
    11L, 11L, 1L, 1L, 1L, 5L, 5L, 5L, 14L, 14L, 14L, 18L, 18L, 
    18L), .Label = c("", "0.706286132", "0.714652956", "0.724364996", 
    "0.7665869", "0.828774512", "0.838611245", "0.846661508", 
    "0.863589227", "0.896049678", "0.929288268", "1", "1.829339266", 
    "15.57538891", "17.64183807", "27.67574501", "3.064466953", 
    "34.78881073", "41.82569504", "8.117406845", "8.884188652", 
    "RQ"), class = "factor"), RQ.Min = structure(c(9L, 9L, 9L, 
    7L, 7L, 7L, 8L, 8L, 8L, 10L, 10L, 10L, 1L, 1L, 1L, 2L, 2L, 
    2L, 21L, 21L, 21L, 17L, 17L, 17L), .Label = c("", "0.032458056", 
    "0.429091513", "0.460811675", "0.541289926", "0.611138761", 
    "0.674698055", "0.71383971", "0.742018044", "0.753834546", 
    "0.772591949", "0.7868222", "0.803419232", "0.820919514", 
    "0.826185584", "0.989573121", "22.58564949", "27.2142868", 
    "4.501103401", "4.745172024", "4.843928814", "4.979007244", 
    "9.076541901", "RQ Min"), class = "factor"), RQ.Max = structure(c(13L, 
    13L, 13L, 8L, 8L, 8L, 6L, 6L, 6L, 9L, 9L, 9L, 1L, 1L, 1L, 
    16L, 16L, 16L, 19L, 19L, 19L, 20L, 20L, 20L), .Label = c("", 
    "0.858568788", "0.910271943", "0.943540215", "0.947846115", 
    "0.962214947", "0.971821666", "1.062453985", "1.145578504", 
    "1.162549496", "1.218146205", "1.244680166", "1.347676158", 
    "14.63914394", "15.85231876", "18.10507202", "20.37916756", 
    "3.381742954", "50.08181381", "53.58541107", "64.28199768", 
    "65.58969879", "84.38751984", "RQ Max"), class = "factor"), 
    C_ = c(25.48042297, 25.4738903, 25.83390617, 25.7304306, 
    25.78297043, 25.41260529, 25.49670792, 25.52298164, 25.6956234, 
    25.34812355, 25.51462555, 25.15455437, 0, 0, 0, 32.29237366, 
    37.10370636, 32.22016525, 29.50172043, 30.18544579, 29.91492081, 
    25.14842796, 24.89806747, 24.99397278), C_.Mean = c(25.59607506, 
    25.59607506, 25.59607506, 25.64200401, 25.64200401, 25.64200401, 
    25.57177162, 25.57177162, 25.57177162, 25.33910179, 25.33910179, 
    25.33910179, NA, NA, NA, 33.87208176, 33.87208176, 33.87208176, 
    29.86736107, 29.86736107, 29.86736107, 25.01348877, 25.01348877, 
    25.01348877), C_.SD = structure(c(21L, 21L, 21L, 20L, 20L, 
    20L, 12L, 12L, 12L, 19L, 19L, 19L, 1L, 1L, 1L, 31L, 31L, 
    31L, 23L, 23L, 23L, 14L, 14L, 14L), .Label = c("", "0.039937571", 
    "0.043110434", "0.049541138", "0.05469643", "0.061177365", 
    "0.066671595", "0.07365533", "0.079849631", "0.082057081", 
    "0.095515646", "0.108060829", "0.120047837", "0.126316145", 
    "0.129658803", "0.130481929", "0.142733917", "0.172286868", 
    "0.180205062", "0.200392827", "0.205995336", "0.236968249", 
    "0.344334781", "0.36769405", "0.413046211", "0.445171326", 
    "0.514641941", "0.640576839", "0.895943522", "0.993181109", 
    "2.798901796", "C_ SD"), class = "factor"), `_C_` = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("", "_C_"), class = "factor"), 
    `_C_.Mean` = structure(c(8L, 8L, 8L, 5L, 5L, 5L, 4L, 4L, 
    4L, 7L, 7L, 7L, 1L, 1L, 1L, 3L, 3L, 3L, 13L, 13L, 13L, 14L, 
    14L, 14L), .Label = c("", "_C_ Mean", "-0.577166259", "-0.68969661", 
    "-0.720502198", "-0.776381195", "-0.85484314", "-0.96064502", 
    "-1.058534026", "-2.04822278", "-2.545912504", "-3.293611526", 
    "-4.921841145", "-6.081196308", "0.477069855", "1.373315215", 
    "2.092705965", "2.244637728", "2.251055479", "2.346632004", 
    "2.456220627", "2.557917356", "2.729323149", "2.746313095"
    ), class = "factor"), `_C_.SE` = structure(c(13L, 13L, 13L, 
    11L, 11L, 11L, 6L, 6L, 6L, 9L, 9L, 9L, 1L, 1L, 1L, 24L, 24L, 
    24L, 21L, 21L, 21L, 15L, 15L, 15L), .Label = c("", "_C_ SE", 
    "0.042180877", "0.042606823", "0.048373949", "0.077573851", 
    "0.088320434", "0.102536619", "0.108728357", "0.113733612", 
    "0.117972165", "0.144372106", "0.155044988", "0.223316222", 
    "0.224465802", "0.258952528", "0.300881863", "0.306413502", 
    "0.319273174", "0.579304695", "0.606897891", "0.635279417", 
    "0.682336032", "1.643036604"), class = "factor"), HK.Control._C_.Mean = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("", "HK Control _C_ Mean"
    ), class = "factor"), HK.Control._C_.SE = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("", "HK Control _C_ SE"
    ), class = "factor"), `__C_` = structure(c(12L, 12L, 12L, 
    16L, 16L, 16L, 18L, 18L, 18L, 13L, 13L, 13L, 1L, 1L, 1L, 
    19L, 19L, 19L, 7L, 7L, 7L, 10L, 10L, 10L), .Label = c("", 
    "__C_", "-0.871322632", "-1.61563623", "-3.021018982", "-3.15124011", 
    "-3.961196184", "-4.140928745", "-4.790550232", "-5.120551586", 
    "-5.38631773", "0", "0.105801903", "0.15834935", "0.211582825", 
    "0.240142822", "0.253925949", "0.27094841", "0.383478791", 
    "0.465211242", "0.484685272", "0.501675308"), class = "factor"), 
    Automatic.Ct.Threshold = structure(c(3L, 3L, 3L, 3L, 3L, 
    3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
    3L, 3L, 3L, 3L), .Label = c("", "Automatic Ct Threshold", 
    "TRUE"), class = "factor"), Ct.Threshold = structure(c(2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("", "0.056211855", 
    "0.208910329", "0.693888608", "0.704941193", "Ct Threshold"
    ), class = "factor"), Automatic.Baseline = structure(c(3L, 
    3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
    3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("", "Automatic Baseline", 
    "TRUE"), class = "factor"), Baseline.Start = structure(c(2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("", "3", "Baseline Start"
    ), class = "factor"), Baseline.End = structure(c(3L, 3L, 
    4L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 13L, 14L, 14L, 8L, 
    12L, 8L, 6L, 7L, 7L, 3L, 3L, 3L), .Label = c("", "21", "22", 
    "23", "25", "26", "27", "29", "30", "31", "32", "34", "35", 
    "39", "Baseline End"), class = "factor"), Efficiency = structure(c(2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("", "1", "Efficiency"
    ), class = "factor"), Comments = structure(c(1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L), .Label = c("", "Comments"), class = "factor"), 
    HIGHSD = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
    3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 3L
    ), .Label = c("", "HIGHSD", "N", "Y"), class = "factor"), 
    NOAMP = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 4L, 4L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("", 
    "N", "NOAMP", "Y"), class = "factor"), OUTLIERRG = structure(c(2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    4L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("", "N", "OUTLIERRG", 
    "Y"), class = "factor"), EXPFAIL = structure(c(3L, 3L, 3L, 
    3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 3L, 3L, 3L, 
    3L, 3L, 3L, 3L, 3L, 3L), .Label = c("", "EXPFAIL", "N", "Y"
    ), class = "factor")), .Names = c("Well", "Sample.Name", 
"Target.Name", "Task", "Reporter", "Quencher", "RQ", "RQ.Min", 
"RQ.Max", "C_", "C_.Mean", "C_.SD", "_C_", "_C_.Mean", "_C_.SE", 
"HK.Control._C_.Mean", "HK.Control._C_.SE", "__C_", "Automatic.Ct.Threshold", 
"Ct.Threshold", "Automatic.Baseline", "Baseline.Start", "Baseline.End", 
"Efficiency", "Comments", "HIGHSD", "NOAMP", "OUTLIERRG", "EXPFAIL"
), row.names = c(12L, 13L, 14L, 24L, 25L, 26L, 36L, 37L, 38L, 
48L, 49L, 50L, 60L, 61L, 62L, 72L, 73L, 74L, 84L, 85L, 86L, 96L, 
97L, 98L), class = "data.frame")

code2 "output":

> dput(`c(8, 9, 10, 21, 22, 23, 33, 34, 35, 46, 47, 48, 58, 59, 60, 73, 74, 75, 85, 86, 87, 97, 98, 99)_slim`)
structure(list(Group.1 = c("Sample2_c", "Sample2_in", "Sample3_c", 
"Sample5_in", "Control_c", "Control_in", "Sample5_c", "Sample3_in"
), x = c(33.8720817566667, 25.6420021066667, 29.8673623433333, 
25.5717709866667, 0, 25.5960731466667, 25.0134894033333, 25.3391011566667
)), .Names = c("Group.1", "x"), row.names = c(NA, -8L), class = "data.frame")

I don't know if this is really the output because of the given name. But the expected output should be something like that with the correct name: STA_slim

Thank you for your time

`` I strongly recommend the use of `lapply` on a list and saving the results in a list (named or otherwise) instead of using `get`/`assign` sequencing. Accessing individual results is as simple as `list`-indexing (`STA_target[["something"]]`), and because it operates purely functionally (avoiding side-effect of `assign`), it is more predictable and reproducible in non-standard workflows. See https://stackoverflow.com/questions/17499013/how-do-i-make-a-list-of-data-frames/24376207#24376207. `` — r2evans, May 22 '19 at 15:12
You're using `i` to refer to the data frame itself as well as its name. — Hugh, May 22 '19 at 15:18

score 4 · Accepted Answer · answered May 22 '19 at 15:20

First of all, I strongly suggest you avoid assign() in your R code. It's much better to use one of the many mapping/apply function in R to build related data in lists. Using get/assign is sign that you are not doing things in a very R-like way.

Your problem has nothing to do with dplyr really, it's what you are looping over in your loop. When you do

  for(i in mget(ls(pattern = "_target"))){
    assign(paste0(i, "_slim"),data.frame(i %>% group_by(Sample.Name) %>% summarise(Mean_dC=mean(C__))))
  }

that i isn't the name of the data.frame, because you did mget() it's the data frame itself. It doesn't make sense to paste that into a new name.

To "fix" this, you could do

for(i in ls(pattern = "_target")){
  assign(paste0(i, "_slim"),data.frame(get(i) %>% group_by(Sample.Name) %>% summarise(Mean_dC=mean(C__))))
}

But even then you don't have a column named C__ in your example data set. You have C_ or _C_ or __C_ (what do these names even mean??). So you'd need to fix that.

The better list way would be

slim <- lapply(mget(ls(pattern = "_target$")) , function(x) {
  x %>% group_by(Sample.Name) %>% summarise(Mean_dC=mean(C_))
})

Many thanks, @MrFlick. Why assign is a bad practice in R? Fixing the point on mget() worked well. About the (C_) column, it was a dirty move to avoid italic in the stackoverflow — BeGentle, May 22 '19 at 16:49
@BeGentle This issue with assign is discussed here: https://stackoverflow.com/questions/17559390/why-is-using-assign-bad — MrFlick, May 22 '19 at 17:50

Why assign() is behaving oddly in for() loop with dplyr pipes in R?

1 Answers1