1

I am trying to do forward selection and am having difficulties using string formulas:

> df <- data.frame(x0=c(1,2,3), x1=c(3,2,1), y=c(1,0,1))
> f0 <- lm("y ~ 1", data=df)
> f1 <- formula(lm("y ~ .", data=df))
> step(f0, direction="forward", scope=f1)
Start:  AIC=-2.51
y ~ 1

Error in eval(predvars, data, env) : 
  invalid 'envir' argument of type 'closure'

I know I could just omit the string-formula portion if I wanted:

> f0 <- lm(y ~ 1, data=df)
> f1 <- formula(lm(y ~ ., data=df))
> step(f0, direction="forward", scope=f1)
Start:  AIC=-2.51
y ~ 1

       Df Sum of Sq     RSS      AIC
<none>              0.66667 -2.51223
+ x0    1         0 0.66667 -0.51223
+ x1    1         0 0.66667 -0.51223

Call:
lm(formula = y ~ 1, data = df)

Coefficients:
(Intercept)  
     0.6667  

But I'd like to be able to dynamically name my dependent variable without having to hardcode it.

blacksite
  • 10,028
  • 6
  • 44
  • 94
  • 1
    There’s absolutely no need to use strings if you want to “dynamically name [your] dependent variable”. You can construct formulas at runtime from expressions, e.g. `as.formula(bquote(.(var) ~ .))`, where `var = as.name('y')`. – Konrad Rudolph Aug 19 '19 at 16:57
  • Also have a look at [Formula with dynamic number of variables](https://stackoverflow.com/q/4951442/10488504) – GKi Aug 20 '19 at 15:03

2 Answers2

2

You're almost there. You just need to add an as.formula function around your string formulae e.g.

df <- data.frame(x0=c(1,2,3), x1=c(3,2,1), y=c(1,0,1))
f0 <- lm(as.formula("y ~ 1"), data=df)
f1 <- formula(lm(as.formula("y ~ ."), data=df))
step(f0, direction="forward", scope=f1)

# make some string formulae objects
step0 <- "y ~ 1"
step1 <- "y ~ ."

# use as.formula
s0 <- lm(as.formula(step0), data=df)
s1 <- formula(lm(as.formula(step1), data=df))
step(s0, direction="forward", scope=s1)
meenaparam
  • 1,675
  • 1
  • 11
  • 25
0

As @konrad-rudolph already suggested in the comments you can use bquote to have a dynamically name of the dependent variable in a regression like:

dependentVariable  <- as.name("y")
f0 <- lm(as.formula(bquote(.(dependentVariable) ~ 1)), data=df)
f1 <- formula(lm(bquote(.(dependentVariable) ~ .), data=df))
step(f0, direction="forward", scope=f1)

or if you don't mind using strings a solution close to @meenaparam answer:

dependentVariable  <- "y"
f0 <- lm(as.formula(paste0(dependentVariable," ~ 1")), data=df)
#f1 <- formula(lm(as.formula(paste0(dependentVariable," ~ .")), data=df)) #Does call lm
#f1 <- as.formula(paste0(" ~ ", paste(names(df)[!grepl(dependentVariable, names(df))], collapse="+"))) #Does not call lm
f1 <- reformulate(names(df)[!grepl(dependentVariable, names(df))]) #Or using reformulate which creates a formula from a character vector
step(f0, direction="forward", scope=f1)

The error comes from:

a <- lm("y ~ 1", data=df)
b <- lm(as.formula("y ~ 1"), data=df)
environment(formula(a)) #<environment: 0x56252c8a5fe0>
environment(formula(b)) #<environment: R_GlobalEnv>

And so the following minimum change to your steps will work:

f0 <- lm(as.formula("y ~ 1"), data=df) #as.formula is added here
f1 <- formula(lm("y ~ .", data=df))
step(f0, direction="forward", scope=f1)
GKi
  • 20,626
  • 1
  • 11
  • 24