0

I try to pass a formula to lm(). However, when I do the following:

independend_vars <- c("PC_1_food_men","covar_prev_diab")
dependent_var <- c("PC_1_mets_men", "PC_2_mets_men", "PC_3_mets_men")

var_names <- independend_vars

formula <- as.formula(paste0(dependent_var, "~", paste0(var_names, collapse = "+")))

I get the following error:

Warning:
Using formula(x) is deprecated when x is a character vector of length > 1.
  Consider formula(paste(x, collapse = " ")) instead. 

Does anyone know where the problem is?

Jzlia10
  • 49
  • 6
  • Try only `formula()` – Duck Jul 22 '20 at 11:42
  • 1
    We need an example please as your code should work i.e. try it for `dependent_var = "y" ; var_names = c("x1", "x2")` ps there is also `?reformulate` ... h im guessing you have multiple dependent variables? i.e. `dependent_var = c("y", "z")` – user20650 Jul 22 '20 at 11:44
  • Check `paste0(dependent_var, "~", paste0(var_names, collapse = "+"))` and see why the warning message says it has length > 1. – Rui Barradas Jul 22 '20 at 11:46
  • @Duck Unfortunately formula() does not work. – Jzlia10 Jul 22 '20 at 11:58
  • @ Rui Barradas I have got the right content when checking just paste0(dependent_var, "~", paste0(var_names, collapse = "+")): "PC_1_mets_men~PC_1_food_men+covar_prev_diab" "PC_2_mets_men~PC_1_food_men+covar_prev_diab" "PC_3_mets_men~PC_1_food_men+covar_prev_diab"..... However, the complete syntax wit as.formula() throws a error – Jzlia10 Jul 22 '20 at 12:02

2 Answers2

5

There was a warning (not an error) because in the question dependent_var has more than one element and it is letting you know it is ignoring all but the first element. Also note that you don't have to convert the string to a formula as lm will accept a character string but if given a character vector of length > 1 it will ignore all but the first element and give a similar warning.

We can modify the code in the question to this:

paste(sprintf("cbind(%s)", toString(dependent_var)), "~", 
  paste(var_names, collapse = " + "))

giving:

[1] "cbind(PC_1_mets_men, PC_2_mets_men, PC_3_mets_men) ~ PC_1_food_men + covar_prev_diab"

however, using reformulate as in the next section is a bit easier.

reformulate

Instead, we can form the LHS using sprintf and then use that with the independent variables in reformulate. Using the built in CO2 data set so that we can actually run the result:

dep_vars <- names(CO2)[4:5]    # c("conc", "uptake")
indep_vars <- names(CO2)[2:3]  # c("Type", "Treatment")

fo <- reformulate(indep_vars, sprintf("cbind(%s)", toString(dep_vars)))
fo
## cbind(conc, uptake) ~ Type + Treatment

lm(fo, CO2)

giving:

Call:
lm(formula = fo, data = CO2)

Coefficients:
                  conc        uptake    
(Intercept)        4.350e+02   3.697e+01
TypeMississippi   -5.582e-14  -1.266e+01
Treatmentchilled   0.000e+00  -6.860e+00

The question had multiple dependent variables but if there were only one then we could simplify the reformulate statement. For example, to only use the first dependent variable:

reformulate(indep_vars, dep_vars[1])
## conc ~ Type + Treatment

Nicer looking Call line

The Call: line above shows the RHS as literally fo but we can use do.call to force it to produce a nicer looking Call: line.

do.call("lm", list(fo, quote(CO2)))

giving:

Call:
lm(formula = cbind(conc, uptake) ~ Type + Treatment, data = CO2)

Coefficients:
                  conc        uptake    
(Intercept)        4.350e+02   3.697e+01
TypeMississippi   -5.582e-14  -1.266e+01
Treatmentchilled   0.000e+00  -6.860e+00
G. Grothendieck
  • 211,268
  • 15
  • 177
  • 297
1

You have a vector of dependent variables, so you have a vector of formulas after your paste call. You can only pass one at a time to as.formula:

independend_vars <- c("PC_1_food_men","covar_prev_diab")
dependent_var <- c("PC_1_mets_men", "PC_2_mets_men", "PC_3_mets_men")

var_names <- independend_vars
string_form <- paste0(dependent_var, "~", paste0(var_names, collapse = "+"))

string_form
#> [1] "PC_1_mets_men~PC_1_food_men+covar_prev_diab"
#> [2] "PC_2_mets_men~PC_1_food_men+covar_prev_diab"
#> [3] "PC_3_mets_men~PC_1_food_men+covar_prev_diab"

as.formula(string_form)
#> Warning: Using formula(x) is deprecated when x is a character vector of length > 1.
#>   Consider formula(paste(x, collapse = " ")) instead.
#> PC_1_mets_men ~ PC_1_food_men + covar_prev_diab

If you want 3 different formulas, you can do as.formula in an lapply

lapply(string_form, as.formula)
#> [[1]]
#> PC_1_mets_men ~ PC_1_food_men + covar_prev_diab
#> <environment: 0x0000000015620b28>
#> 
#> [[2]]
#> PC_2_mets_men ~ PC_1_food_men + covar_prev_diab
#> <environment: 0x0000000015620b28>
#> 
#> [[3]]
#> PC_3_mets_men ~ PC_1_food_men + covar_prev_diab
#> <environment: 0x0000000015620b28>

If you don't want 3 formulas, it's not clear to me what you're trying to do.

Created on 2020-07-22 by the reprex package (v0.3.0)

Allan Cameron
  • 56,042
  • 3
  • 16
  • 39