-2

I want to predict a numerical variable. I have a couple of factors. For all that factors I have a numerical equivalent. Now it would be perfect to assign that numerical equivalent to the factor and use it in the prediction. Is this possible? If this is not possible I guess I will need to replace the factors with their numerical equivalent. What is the best way to do so?

An Example:

df = data.frame(f=c("a","b","a","c"),v=c(2,4,2,6))
lookup = data.frame(name=c("a","b","c"),v=c(1,2,3))

What I would like to get

df2 = data.frame(f=c(1,2,1,3),v=c(2,4,2,6))
cor(df2$f,df2$v) # will be 1
Matthew Lundberg
  • 39,899
  • 6
  • 81
  • 105
nik
  • 1,488
  • 3
  • 14
  • 35
  • How do you mean, the factors have numerical equivalents? Factors are categories. When you say prediction, what do you mean? – TARehman Jul 15 '14 at 16:11
  • R treats factors as categorical variables and numeric values as continuous variables. The two types of variables often have different statistical methods associated with them and the interpretation of a model differs by variable type. You really should decide what type of analysis is appropriate for your data first. – MrFlick Jul 15 '14 at 16:14
  • I added an example to make it more clear. The letters are what I got, the numbers in the lookup-table some average values I calculated before and would like to use now. – nik Jul 15 '14 at 16:43

2 Answers2

1

Or

df2 <- merge(df, lookup, by.x = "f", by.y = "name")
cor(df2[, 2], df2[, 3])

Or if your data sets are big

library(data.table)
setkey(setDT(df), f)
setkey(setDT(lookup), name)
df2 <- df[lookup]
cor(df2[, 2, with = F], df2[, 3, with = F])
David Arenburg
  • 87,271
  • 15
  • 123
  • 181
0

Does this help?

cor(lookup$v[match(df$f,lookup$name)],df$v)
Jörg Mäder
  • 677
  • 4
  • 11
  • thanks, that works as well, but only if there is only one column needed for identification. I need more (even that is was not included in my example) – nik Jul 16 '14 at 08:45