Find different kinds of missing values with one command - syntax error?

Question

I seem to have a syntactical error that I am not able to find.

I have a column of a data frame seperated in a list called column. I want to find all the missing values in this column. However, for some reason I get the whole list returned.

Here's my attempt

> length(column)
[1] 712789
> length(column[column == ""])
[1] 24181
> length(column[column == "0"])
[1] 24181
> length(column[is.na(column)])
[1] 24181
> length(column[column == "" || column == "0" || is.na(column)])
[1] 712789

This is strange. I would expect the last subset to be 24181 as well. Even if all the subsets above would refer to different elements, the output shouldn't be greater than 24181 x 3 = 72549. Instead, the whole list is part of the subset.

What am I doing wrong?

[edit]
For couriosity I tried any combination of two (instead of three) subsets: the result also was 712789 each time.

G. Grothendieck · Accepted Answer · 2019-10-22T12:24:32.220

1

The last line of code should use | rather than ||.

A single vertical bar works with vectors but the double bar only works with scalars.

Suppose the ith value of column is "". Then the ith value of the result is "" == "" | "" == "0" | is.na("") which equals TRUE | FALSE | FALSE which is TRUE.

Suppose the ith value of column is "0". Then the ith value of the result is "0" == "" | "0" == "0" | is.na("0") which equals FALSE | TRUE | FALSE which is TRUE.

Suppose the ith value of column is NA. Then the ith value of the result is NA == "" | NA == "0" | is.na(NA) which equals NA | NA | TRUE which is TRUE.

Thus the ith value of the result is TRUE for any of the conditions. It is FALSE otherwise.

For more information see: Boolean operators && and ||

edited Oct 22 '19 at 12:24

answered Oct 21 '19 at 15:12

G. Grothendieck

211,268
15
177
297

Thank you! Shocking, I never realised the big difference between bitwise and logical operators :O – speendo Oct 21 '19 at 15:25
But just for couriosity: why does that make a difference? – speendo Oct 21 '19 at 15:44
yes, this I understand so `||` is faster (which probably is the reason why I used it all the time). But how can `||` ever evaluate to `TRUE` when `|` evaluates to false? – speendo Oct 21 '19 at 15:58
ok, now I see there is a difference between `|` and `||`. Thank you. But I did run `length(column[column == "" | column == "0" | is.na(column)])` again and didn't receive an error but just the expected return of `24181`. Should I be worried about this? Sorry to bug you. I just don't get the point although you are obviously right :) – speendo Oct 21 '19 at 16:07
1

Have moved comments to answer. – G. Grothendieck Oct 21 '19 at 16:34
Thank you! I think I kind of get it now. Also the link helped! – speendo Oct 23 '19 at 05:48

Find different kinds of missing values with one command - syntax error?

1 Answers1