I have a large survey that I am working with. The main survey data is stored as a CSV file. The variable and value label files are included in both SAS-format as well as SPSS-format, but stored as TXT files.
I have seen that there are several ways to easily read in data to R when it is saved in SAS/SPSS/STATA native formats (.sas/.sav/.dta). So far, all of the solutions that I've come across using haven
, labelled
, foreign
solutions assume that the data is stored using the respective "other" program's native format. I can't seem to find any advice on how to add labels to CSV data using TXT files stored in some standard SAS/SPSS format.
So let's say the three files I have are named:
data.csv
sas_var_labels.txt
sas_val_labels.txt
data.csv
looks something like:
AB001; AB002; AC001
-9; -9; -7
-1; -9; -8
-3; -9; 100
-9; -1; 200
-4; -1; 100
sas_var_labels.txt
looks like this:
AB001 = "A-Section A, category B, question 1"
AB002 = "A-Section A, category B, question 2"
AC001 = "A-Section A, category C, question 1"
And finally, sas_val_labels.txt
looks like this:
; value AB001 -9 = "-9.not applicable"
-8 = "-8.no response"
-7 = "-7.unknown"
-1 = "-1.other duration"
1 = "1.1 year"
2 = "2.1 to 3 years"
3 = "3.4 to 6 years"
4 = "4.More than 6 years"
; value AB002 -9 = "-9.not applicable"
-8 = "-8.no response"
-7 = "-7.unknown"
-1 = "-1.other type"
; value AC001 -9 = "-9.not applicable"
-8 = "-8.no response"
-7 = "-7.unknowns"
-5 = "-5.non-codable"
-1 = "-1.other category"
100 = "100.First division"
200 = "200.Second division"
My approach so far:
For the main data, I just call:
dat <- read.csv("data.csv", sep=";", stringsAsFactors=FALSE)
For the variable labels, as the number of labels matches the columns in the data exactly, I'm just reading in the TXT file the same way, and then assigning the second column to the data using var_label()
from the labelled
package:
#first I read in the variable labels into R as a dataframe
var_labs <- read.csv("sas_var_label.txt", sep="=", stringsAsFactors=FALSE, header=FALSE, strip.white=TRUE)
#next, I assign the second column to the data as it matches exactly (for the moment)
labelled::var_label(dat) <- var_labs$V2
For the value labels, on the other hand, my approach becomes more complicated as the labels file is not neatly separated by delimiters for all of the columns and row combinations, as you can see in the example provided above.
My first question is: is there an easy way to read in this file so that the structure of the value label TXT file is preserved?
My more general question is: is there a a better approach to handling these label TXT files? I'm sure I'm missing something, so any suggestions would be welcome.