1

I am reading a set of sas data into r. I wonder whether there is a code that I use to get the variable name and variable label into a data.frame, or sth like a codebook?

I used haven package to read in data

haven:read_sas

I wonder whether it saved data labels in a place. if so, I can I get it out?

The data in r looks like this:

enter image description here

I want to build a data.frame that looks like this:

enter image description here

error codes:

<error/purrr_error_bad_element_vector>
Result 6 must be a single string, not NULL of length 0
Backtrace:
     x
  1. +-base::debug(list_of_labels <- lapply(datasets, label_lookup_map))
  2. +-base::lapply(datasets, label_lookup_map)
  3. | \-global::FUN(X[[i]], ...)
  4. |   \-tibble::tibble(col_name = df %>% names(), labels = df %>% map_chr(attr_getter("label")))
  5. |     \-tibble:::tibble_quos(xs[!is_null], .rows, .name_repair)
  6. |       \-rlang::eval_tidy(xs[[j]], mask)
  7. +-df %>% map_chr(attr_getter("label"))
  8. | +-base::withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
  9. | \-base::eval(quote(`_fseq`(`_lhs`)), env, env)
 10. |   \-base::eval(quote(`_fseq`(`_lhs`)), env, env)
 11. |     \-`_fseq`(`_lhs`)
 12. |       \-magrittr::freduce(value, `_function_list`)
 13. |         +-base::withVisible(function_list[[k]](value))
 14. |         \-function_list[[k]](value)
 15. |           \-purrr::map_chr(., attr_getter("label"))
 16. \-purrr:::stop_bad_element_vector(...)
 17.   \-purrr:::stop_bad_vector(...)
 18.     \-purrr:::stop_bad_type(...)

Itr looks like the error was cause by a data that looks like this:

enter image description here

sample data can be build by

df<- structure(list(VISITNUM = c(4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 
4, 4, 4, 4, 4, 14, 14, 14, 14), EXDOSE = c(36, 109, 182, 182, 
182, 182, 182, 55, 36, 55, 36, 55, 109, 182, 109, 182, 2600, 
2600, 2600, 2600), EXDOSU = c("mg", "mg", "mg", "mg", "mg", "mg", 
"mg", "mg", "mg", "mg", "mg", "mg", "mg", "mg", "mg", "mg", "mg", 
"mg", "mg", "mg")), label = "EX                              ", row.names = c(NA, 
20L), class = "data.frame")
Stataq
  • 989
  • 1
  • 7

1 Answers1

4

You may find this question helpful: Extract the labels attribute from "labeled" tibble columns from a haven import from Stata

Here's an example:

library(haven)
library(tidyverse)

airline <- read_sas("http://www.principlesofeconometrics.com/sas/airline.sas7bdat")

label_lookup_map <- tibble(
  col_name = airline %>% names(),
  labels = airline %>% map_chr(attr_getter("label"))
)

print(label_lookup_map)
# # A tibble: 6 x 2
# col_name labels         
# <chr>    <chr>          
# 1 YEAR   year           
# 2 Y      level of output
# 3 W      wage rate      
# 4 R      interest rate  
# 5 L      labor input    
# 6 K      capital input

Edit: Based on the comments, here's an example if you wanted to get the labels for multiple data.frames in a list where some of the data.frames do not have labels.

library(haven)
library(tidyverse)

airline <- read_sas("http://www.principlesofeconometrics.com/sas/airline.sas7bdat")
cola <- read_sas("http://www.principlesofeconometrics.com/sas/cola.sas7bdat")
data(iris)

list_of_tbl <- list(airline, cola, iris)

get_labels <- attr_getter("label")

has_labels <- function(df) {
    !all(sapply(lapply(df, get_labels), is.null))
}

label_lookup_map <- function(df) {

    df_labels <- NA
    if (has_labels(df)) {
        df_labels <- df %>% map_chr(get_labels)
    }
 
  tibble(
    col_name = df %>% names,
    labels = df_labels
  )
}

list_of_labels <- lapply(list_of_tbl, label_lookup_map)

print(list_of_labels)
# [[1]]
# # A tibble: 6 x 2
#   col_name labels         
#   <chr>    <chr>          
# 1 YEAR     year           
# 2 Y        level of output
# 3 W        wage rate      
# 4 R        interest rate  
# 5 L        labor input    
# 6 K        capital input  

# [[2]]
# # A tibble: 5 x 2
#   col_name labels                                   
#   <chr>    <chr>                                    
# 1 ID       customer id                              
# 2 CHOICE   = 1 if brand chosen                      
# 3 PRICE    price of 2 liter soda                    
# 4 FEATURE  = 1 featured item at the time of purchase
# 5 DISPLAY  = 1 if displayed at time of purchase     

# [[3]]
# # A tibble: 5 x 2
#   col_name     labels
#   <chr>        <lgl> 
# 1 Sepal.Length NA    
# 2 Sepal.Width  NA    
# 3 Petal.Length NA    
# 4 Petal.Width  NA    
# 5 Species      NA 
vikjam
  • 455
  • 3
  • 8
  • Thanks so much. What if `airline` is a list that contain multiple data.frame?how can I get to the data.frame level and then get col_name and labels? – Stataq Dec 16 '20 at 04:12
  • Perhaps you could create a function and use `lapply`. – vikjam Dec 16 '20 at 04:19
  • Could you give me an example? – Stataq Dec 16 '20 at 04:20
  • If `airline` has 3 data.frame: `df1,df2,df3`. how can I assign the name for mapping file. I can think of `label_lookup_map % names(), labels = x %>% map_chr(attr_getter("label")) )} lapply(airline, label_lookup_map)`. – Stataq Dec 16 '20 at 04:28
  • Thanks so much for updated answer. When I ran the codes on my data, I got error code `Error: Result 6 must be a single string, not NULL of length 0`. Do you know what it means? what should I do in order to fix this? what should I do in order to see what cause the error? – Stataq Dec 16 '20 at 04:38
  • I tried `lst1 – Stataq Dec 16 '20 at 04:45
  • I updated post with more info on error codes. I don't know how to understand it – Stataq Dec 16 '20 at 04:52
  • Seems like the error is occurring because one of the data.frames in the list does not have labels. I've added additional code to account for this. – vikjam Dec 16 '20 at 05:10