4

I'm using the haven package for R to read an spss file with user_na=TRUE. The file has many string variables with value labels. In R only the first of the string variables (SizeofH1) has the correct value labels assigned to it as attribute. Unfortunately I cannot not even provide a snippet of this data to make this fully reproducible but here is a screenshot of what I can see in PSPP

PSPP Data editor

and what str() in R returns...

 $ SizeofH1:Class 'labelled'  atomic [1:280109] 3 3 3 3 ...
 ..- attr(*, "label")= chr "Size of Household ab 2002"
 ..- attr(*, "format.spss")= chr "A30"
 ..- attr(*, "labels")= Named chr [1:9] "1" "2" "3" "4" ...
 ..- attr(*, "names")= chr [1:9] "4 Persons" "2 Persons" "1 Person 50 years plus" "3 Persons" ...
 $ PROMOTIO: atomic  40 1 40 40 ...
 ..- attr(*, "label")= chr "PROMOTION"
 ..- attr(*, "format.spss")= chr "A30"
 $ inFMCGfr: atomic  1 1 1 1 ...
 ..- attr(*, "label")= chr "in FMCG from2011"
 ..- attr(*, "format.spss")= chr "A30"
 $ TRADESEG: atomic  1 1 1 1 ...
 ..- attr(*, "label")= chr "TRADE SEGMENT"
 ..- attr(*, "format.spss")= chr "A30"
 $ ORGANISA: atomic  111 111 111 111 ...
 ..- attr(*, "label")= chr "ORGANISATION"
 ..- attr(*, "format.spss")= chr "A30"
 $ NAME    : atomic  9 9 9 9 ...
 ..- attr(*, "label")= chr "NAME"
 ..- attr(*, "format.spss")= chr "A30"

I hope someone can point me to any possible reason that causes this behavior.

supersambo
  • 791
  • 1
  • 8
  • 23
  • Changing the variable type from string to numeric (in spss) solved the issue for me in this case. However, I'm still not sure why the first column was read correctly and how to solve this problem without access to a spss version. – supersambo Oct 11 '16 at 14:33

2 Answers2

2

The "semantics" vignette has some useful information on this topic.

library(haven)
vignette('semantics')

There are a couple of options to get value labels. I think a good one is the example demonstrated below, using the map function from the purrr package (but could be done with lapply instead, too)

# Get data from spss file
df <- read_sav(path_to_file)

# get value labels
df <- map_df(.x = df, .f = function(x) {
  if (class(x) == 'labelled') as_factor(x)
  else x})
# get column names
colnames(df) <- map(.x = spss_file, .f = function(x) {attr(x, 'label')})
JanLauGe
  • 2,107
  • 2
  • 12
  • 38
  • I was using the code for the '# get value labels' bit, and it worked perfectly. But last few weeks something seems to have changed and the factorization does not take place any more, without me changing the code. The haven package as_factor in this case helped a lot, with only_labelled = TRUE (default), as suggested here: https://stackoverflow.com/a/52023520/8124725 – 4rj4n Mar 21 '19 at 09:58
1

The best is to save your spss file as CSV and then read it in R. I've faced this before and some strings didn't read correctly- Generally SPSS is not very smart when it comes to string variables that this could contribute to the problem.

RomRom
  • 274
  • 1
  • 9
  • Thanks. This actually helps. However, I hoped there was any way to solve this issue without using SPSS. – supersambo Oct 21 '16 at 07:52
  • 2
    if you don't have a spss license, there is an open source application similar to SPSS and will allow you to import SAV files and export them to csv. here you can find the software: http://www.gnu.org/software/pspp/ and the guide through: http://lists.gnu.org/archive/html/pspp-users/2011-11/msg00033.html – RomRom Oct 21 '16 at 10:17