1

I've used the package haven to read SPSS data into R. All seems ok, except that when I try to subset the data it doesn't seem to behave correctly. Here's the code (I don't have SPSS to create example data and can't post the real stuff):

require(haven)
df <- read_spss("filename1.sav")
tmp <- df[as_factor(df$variable1) == "factor1",]
tmp <- tmp[!is.na(tmp$variable2), ]

The above df has "NA" scattered throughout. I expected the above to subset only the data, keeping only rows with variable1 with "factor1" and discarding all rows with NAs in variable2. The first subset works as expected. But the second subset does not. It removes rows, but NAs are still present.

I suspect the issue has something to do with the way haven structures the imported data and uses the class labelled instead of an actual factor variable, but it's over my head. Anyone know what could be happening and how to accomplish the same?

Here's the structure of df, variable1 and variable2:

> str(df)
'data.frame':   4573 obs. of  316 variables:

> str(df$variable1)
Class 'labelled'  atomic [1:4573] 9 9 9 14 8 8 2 4 8 16 ...
  ..- attr(*, "labels")= Named num [1:18] 1 2 3 4 5 6 7 8 9 10 ...
  .. ..- attr(*, "names")= chr [1:18] "factor1" "factor2" "factor3" "factor4" ...

> str(df$variable2)
Class 'labelled'  atomic [1:4573] 3 NA 3 NA 3 NA 1 1 NA NA ...
  ..- attr(*, "labels")= Named num [1:3] 1 2 3
  .. ..- attr(*, "names")= chr [1:3] "Sponsor" "Not a Sponsor" "Don't Know"
ssp3nc3r
  • 3,247
  • 2
  • 9
  • 22

0 Answers0