-1

I have some nonintuitive behaviour in R.

According to the documentation

& and && indicate logical AND and | and || indicate logical OR. The shorter form performs elementwise comparisons in much the same way as arithmetic operators. The longer form evaluates left to right examining only the first element of each vector. Evaluation proceeds only until the result is determined. The longer form is appropriate for programming control-flow and typically preferred in if clauses.

Being a new to R, this does not yet trigger a full mental model of what is supposed to happen, but it sounds similar to the 'and' vs 'conditional and' seen in other programming languages (also called Short-Circuiting Logical Operators)

Create a data frame:

mhbins <- data.frame(val=seq(0,10),bin=rep(c(NA),length.out=11))

   val bin
1    0  NA
2    1  NA
3    2  NA
4    3  NA
5    4  NA
6    5  NA
7    6  NA
8    7  NA
9    8  NA
10   9  NA
11  10  NA

Then patch it up:

mhbins$bin[1 <= mhbins$val & mhbins$val <= 7] <- "some"

   val  bin
1    0 <NA>
2    1 some
3    2 some
4    3 some
5    4 some
6    5 some
7    6 some
8    7 some
9    8 <NA>
10   9 <NA>
11  10 <NA>

This is expected (Note that the printout form for the the NA value changes. Commenter Tensibai explains that: "NA is the "numeric" NA, and <NA> is the character one, as a vector (which a df column is) can only be of one type, when you enter a character value it gets coerced to character and that's why the NA representation change.")

though, another mystery).

However, using &&, patch operation reduces to NA:

mhbins <- data.frame(val=seq(0,10),bin=rep(c(NA),length.out=11))

   val bin
1    0  NA
2    1  NA
3    2  NA
4    3  NA
5    4  NA
6    5  NA
7    6  NA
8    7  NA
9    8  NA
10   9  NA
11  10  NA

mhbins$bin[1 <= mhbins$val && mhbins$val <= 7] <- "some"

   val  bin
1    0 <NA>
2    1 <NA>
3    2 <NA>
4    3 <NA>
5    4 <NA>
6    5 <NA>
7    6 <NA>
8    7 <NA>
9    8 <NA>
10   9 <NA>
11  10 <NA>

I don't understand what's going on here.

Community
  • 1
  • 1
David Tonhofer
  • 12,954
  • 4
  • 44
  • 46
  • 1
    `NA` is the "numeric" NA, and `` is the character one, as a vector (which a df column is) can only be of one type, when you enter a character value it gets coerced to character and that's why the NA representation change. – Tensibai Feb 09 '17 at 12:52
  • @Tensibai Thanks you, I will insert your comment into the question. – David Tonhofer Feb 09 '17 at 13:26
  • In the end, this question is less about & vs && which is well answered in the link-to-duplicate but about "what does R think it is doing with ' – David Tonhofer Feb 09 '17 at 14:28
  • The source of your question is how & and && behave, the assignment is still an assignment, you just have a glitch on the subset: mainly, only the & version will return a logical vector for all values, where && will return only one TRUE or FALSE, hence whole df or nothing. – Tensibai Feb 09 '17 at 14:31
  • @Tensibai Well, no because `mhbins$bin[1 <= mhbins$val && mhbins$val <= 7] – David Tonhofer Feb 09 '17 at 16:00
  • 1
    Of course `mhbins$bin[1 <= mhbins$val & mhbins$val <= 7]` does not yield a vector of position but the values where the inner of `[]` are TRUE values. Try only: `1 <= mhbins$val & mhbins$val <= 7`, you'll see this is a vector of TRUE/FALSE values, which are used to subset the column like in `mhbins$bin[c(FALSE,FALSE,TRUE,TRUE,TRUE,FALSE)]` would select the same thing as: `mhbins$bin[c(3,4,5)]` or `mhbins$bin[3:5]` with a range. – Tensibai Feb 09 '17 at 16:05
  • @Tensibai Thanks. Time for more "R in Action" I reckon. – David Tonhofer Feb 13 '17 at 21:30
  • If you haven't read it already, have a look at R inferno, getting your mind around the vectorial way of computing in R is the base for everything after, once you get the idea everything gets clearer – Tensibai Feb 13 '17 at 21:36
  • side note, the short circuit happens for both operators, but at different level, an understanding of vector-based operations is necessary to get over it. – Tensibai Feb 13 '17 at 21:37
  • @Tensibai Yes, I will say. Not really, just somewhat like SQL. And I have the feeling the "factor" injects even more problems, what a weird concept. I will sort it out eventually. – David Tonhofer Feb 13 '17 at 21:42
  • Factors are the same thing as what you do in Sql for 'categories' to save space, an integer 'id' mapping the whole name to avoid repeating the same text N time. Kinda usual isn't it ? – Tensibai Feb 13 '17 at 21:45
  • 1
    TBH unless you need some categories based functions needing factors as input or hitting a memory limit, avoid them for ease of coding untill you're more comfortable with the vectorial approach, which is more or less a Select based computing in SQL. Think working on a full column of a table instead of row by row as the target. – Tensibai Feb 13 '17 at 21:48

1 Answers1

1

The '&' here returns a vector, It is 'and' operator between each pair of 1 <= mhbins$val and mhbins$val <= 7 While '&&' looks at only first pair of 1 <= mhbins$val and mhbins$val <= 7

Example

c(TRUE,TRUE) & c(FALSE,TRUE) `returns <[1] FALSE  TRUE>`
c(TRUE,TRUE) && c(FALSE,TRUE) `returns <[1] FALSE>`
anonR
  • 759
  • 5
  • 23
  • Ok, hold on. leaving out `&` vs `&&`. So `mhbins$bin[ 1 <= mhbins$val & mhbins$val <= 7]` returns a vector. But then the assignment operation ` – David Tonhofer Feb 09 '17 at 13:13