1

When converting haven_labelled variables to factor variables I (seem to) lose the underlying "labels" (using tidyverse terminology I think...).

# this sets up a factor var x with non-continuous numeric values
library(tidyverse)
library(labelled)

x <- sample( c(1, 5, 10, 20), 1000000, replace=TRUE, prob=c(0.1, 0.2, 0.65, 0.05) )
x_tib <- as_tibble(x) %>% 
  set_value_labels(value = c("Letter A" = 1, 
                             "Letter B" = 5, 
                             "Letter C" = 10, 
                             "Letter D" = 20))

The attributes of x_tib$value are as I would expect

attributes(x_tib$value)
glimpse(x_tib$value)
> attributes(x_tib$value)
$labels
Letter A Letter B Letter C Letter D 
       1        5       10       20 

$class
[1] "haven_labelled"

> glimpse(x_tib$value)
 'haven_labelled' num [1:1000000] 10 10 10 5 10 5 10 10 10 10 ...
 - attr(*, "labels")= Named num [1:4] 1 5 10 20
  ..- attr(*, "names")= chr [1:4] "Letter A" "Letter B" "Letter C" "Letter D"

However, after I convert this to a factor variable (as recommended in the haven documents) I appear to lose the original "labels" (1, 5, 10, 20 becomes 1, 2, 3, 4).

attributes(as_factor(x_tib$value))
glimpse(as_factor(x_tib$value))
> attributes(as_factor(x_tib$value))
$levels
[1] "Letter A" "Letter B" "Letter C" "Letter D"

$class
[1] "factor"

> glimpse(as_factor(x_tib$value))
 Factor w/ 4 levels "Letter A","Letter B",..: 3 3 3 2 3 2 3 3 3 3 ...

Can I keep the underlying "labels"?

Note - I recognise that I can encode them in the "levels" option of as_factor (e.g. as_factor(x_tib$value, "value") or as_factor(x_tib$value, "both")).

ChrisP
  • 119
  • 1
  • 8
  • Why not use them as the factor levels? I don't know SPSS well, but I believe what are called labels in SPSS serve a similar purpose to factor levels in R, so your guess at the end seems appropriate – camille May 31 '19 at 17:35
  • In SPSS or Stata both the text label and numeric level contain information about the variable. For some reason haven seems to remove the numeric level in favour of a new "label". I'm just not sure why haven doesn't retain the original numeric level. – ChrisP May 31 '19 at 19:51

0 Answers0