Best way in R to pick which level is the base category for a factor in an lm regression

Question

Suppose I want to run a regression using lm and a factor as a right hand side variable. What is the best way to choose which level in the factor is the base category (the one that is excluded to avoid multicollinearity). Note that I am not interested in excluding the intercept because I have many factors.

I would also like a formula-based solution, not one that acts on the data.frame directly, although if you think you have a really good solution for that, please post it as well.

My solution is:

base_cat <- function(x) c(x,1:(x-1),(x+1):100) 
a_reg <- lm(y ~ x1 + x2 + factor(x3, levels=base_cat(30)) #suppose that x3 has draws from the integers 1 to 100.

The left out category by lm is the first level in the factor so this just reorders the levels so that the one specified in base_cat() is the first one, and puts the rest after.

Any other ideas?

score 6 · Accepted Answer · answered Oct 19 '11 at 21:41

6

The function relevel does precisely this. You pass it an unordered factor and the name of the reference level and it returns a factor with that level as the first one.

answered Oct 19 '11 at 21:41

joran

157,274
30
404
439

Best way in R to pick which level is the base category for a factor in an lm regression

1 Answers1