7

I am trying to understand the meaning of this statement in R in a code written by somebody else.

mymodel = lm(gene ~ ., data = mydata) 

mydata is as follows:

> mydata
                 gene    cna rs11433683      PC1    PC2
TCGA.BH.A0C0 270.7446 0.1291          0 270.7446 0.1291
TCGA.A2.A3XY  87.9092 0.0128          1  87.9092 0.0128
TCGA.XX.A89A 255.1346 0.1530          1 255.1346 0.1530

I have gone through the R help section to find how . is interpreted. I understand that . is typically not used, but this is what I found

help(formula)

There are two special interpretations of . in a formula. The usual one is in the context of a data argument of model fitting functions and means ‘all columns not otherwise in the formula’: see terms.formula. In the context of update.formula, only, it means ‘what was previously in this part of the formula’

help(terms.formula)

AllowDotAsName: normally . in a formula refers to the remaining variables contained in data. Exceptionally, . can be treated as a name for non-standard uses of formulae.

data: a data frame from which the meaning of the special symbol . can be inferred. It is unused if there is no . in the formula.

However, I am not really sure what the statements mean. Can somebody give me a simple example of what it means in the context of statement and data I mentioned above?

alistaire
  • 38,696
  • 4
  • 60
  • 94
alpha_989
  • 3,797
  • 28
  • 42
  • 1
    https://stats.stackexchange.com/questions/10712/what-is-the-meaning-of-the-dot-in-r – Severin Pappadeux Aug 12 '17 at 23:09
  • 1
    It means use all other variables (cna, rs..., pc1 and pc2) as independent variables in the model. – ayhan Aug 12 '17 at 23:14
  • 3
    It's exactly what it says it is: `all the columns` (from the data supplied to the `data` parameter) `not otherwise in the formula`. In this case, since `gene` is supplied, the rest are taken as explanatory variables, so `gene ~ .` is equivalent to `gene ~ cna + rs11433683 + PC1 + PC2`. Explanations will only go so far, though; try it out and look at the difference in the resulting model. – alistaire Aug 12 '17 at 23:58
  • 2
    \*To be clear, that's what `.` means _in a formula_. Some packages use it to mean other things, particularly when piping or as a function. – alistaire Aug 13 '17 at 00:11
  • Thanks guys.. that makes a lot of sense now.. I wish an example like what you just mentioned was included in the help() section. I wouldnt have spent so many hours banging my head and searching in the helpfiles.. Is there a way to suggest this change in R documentation? – alpha_989 Aug 13 '17 at 00:41

2 Answers2

6

in the context of a data argument of model fitting functions and means ‘all columns not otherwise in the formula’

Exactly what it says there on the box!

So with

 mymodel = lm(gene ~ ., data = mydata) 

you get every variable other than gene that's in mydata on the RHS of the formula:

   cna + rs11433683 + PC1 + PC2

As far as I can see, the quoted phrase is clear and unambiguous (... but you could also see it just from trying a few small examples)

The only thing that might not be obvious is what it does if you didn't supply a data argument (but that's answered in the help of terms.formula that is referred to in your quote).

Glen_b
  • 7,173
  • 1
  • 32
  • 45
  • I agree that in hindsight, the quoted text seems clear, as soon as I saw the example. However, "The usual one is in the context of a data argument of model fitting functions and means ‘all columns not otherwise in the formula’: see terms.formula" doesnt necessarily lead the reader to understand that y~. is equivalent to y~a+b+c... where a,b,c are the other columns of data, especially if they are new to the field. The help section is written in dense text, and assumes a high degree of familiarity with R already.. – alpha_989 Aug 13 '17 at 14:38
-1

Means you are comparing gene to all the variables

ZWL
  • 191
  • 1
  • 4