0

This is a solution I found to work with labelled data from SPSS in R.

I'm working with a survey provided in SPSS and I moved from foreign to haven.

I read Convenient way to access variables label after importing Stata data with haven and I could not find a way to express my labelled variables as factors.

What I tried was to extract the attributes using purrr package and then covert some variables to factor. No success!

How to work with labelled data from SPSS in R

1: Read the data

library(dplyr)
library(haven)
library(purrr)
library(sjlabelled)

url = "http://users.dcc.uchile.cl/~mvargas/auxiliares_cc5208/nesi_individuals_with_grants_2015_spss.zip"
zip = paste0(getwd(),"/nesi_individuals_with_grants_2015_spss.zip")
sav = paste0(getwd(),"/nesi_individuals_with_grants_2015.sav")

download.file(url, zip, method="curl")

system(paste0("7z e ",zip," -oc:",getwd()))

nesi_individuals_with_grants = tbl_df(read_sav(sav))

# as expected the variables have no levels
# B14 is a variable that refers to where do people work (e.g. 1= startup, 2= bank, 3 = hospital, etc)
levels(nesi_individuals_with_grants$B14)

2: Create a table to obtain what the numbers (the labels) mean:

classifications_all = tbl_df(nesi_individuals_with_grants) %>% 
    select(OCUP_REF,SEXO,CISE,CINE,B1,B14,C1) %>% 
    rename(occupation_id = OCUP_REF, sex_id = SEXO, icse_id = CISE, isced_id = CINE,
           isco_id = B1, journey_id = C1)

  occupation = classifications_all %>% 
    select(occupation_id) %>% 
    mutate(occupation = get_label(occupation_id)) %>% 
    distinct()

That returns

# A tibble: 3 x 2
  occupation_id                                           occupation
      <dbl+lbl>                                                <chr>
1             1 Binario Ocupados de Referencia Tabulados de Personas
2           NaN Binario Ocupados de Referencia Tabulados de Personas
3             0 Binario Ocupados de Referencia Tabulados de Personas

Which is the variable label, then I try

  occupation = classifications_all %>% 
    select(occupation_id) %>% 
    distinct() %>% 
    filter(!is.nan(occupation_id)) %>% 
    mutate(occupation = get_labels(occupation_id))

It works !

> occupation
# A tibble: 2 x 2
  occupation_id                                      occupation
      <dbl+lbl>                                           <chr>
1             1 Ocupados con menos de 1 mes en el empleo actual
2             0   Ocupados con más de 1 mes en el empleo actual
zx8754
  • 42,109
  • 10
  • 93
  • 154
pachamaltese
  • 2,702
  • 4
  • 24
  • 48

1 Answers1

2

Do you want to set value labels as factor levels? Then you could try sjlabelled::as_label() or sjmisc::to_label() (which both are the same, it's just that I did not completely remove to_label from sjmisc, but kept it for backwards compatibility).

Daniel
  • 6,454
  • 5
  • 21
  • 35