I'm struggling with ways to efficiently turn labelled variables into factors. The dataset I'm working with is available from here: [https://www.dropbox.com/s/jhp780hd0ii3dnj/out.sav?dl=0][1]. It was an spss data file, which I like to use because of what my colleagues use.
When I read in the data, you can see that every single factor from the file is turned into a "labelled" class.
#load libraries
library(haven)
library(tidy)
library(dplyr)
#Import
test<-read_sav(path='~/your/path/name/out.sav')
#Structure
str(test)
#Find Class
sapply(test, class)
The first problem that I have is that ggplot2 doesn't know how to apply a scale to a labelled class.
#
td<-ford %>%
select(income, stress) %>%
group_by(income, stress)%>%
filter(is.na(stress)==FALSE)%>%
filter(is.na(income)==FALSE)%>%
summarize(Freq=n())%>%
mutate(Percent=(Freq/sum(Freq))*100)
#Draw plot
ggplot(td, aes(x=income, y=Percent, group=stress))+
#barplot
geom_bar(aes(fill=stress), stat='identity')
That can be solved quite nicely by wrapping the categorical variable 'income' in as_factor()
#Draw plot
ggplot(td, aes(x=as_ford(income), y=Percent, group=stress))+
#barplot
geom_bar(aes(fill=stress), stat='identity')
That works of rone variable, however, If I'm doing exploratory research , I may be doing a lot of plots with a lot of labelled variables. That strikes me as quite a lot of extra typing.
This problem is magnified with the problem of that when you gather a lot of variables to plot several crosstabs, you lose the value labels.
##Visualizations
test<-ford %>%
#The first two variables are the grouping, variables for a series of cross tabs
select(ford, stress,resp_gender, immigrant2, education, property, commute, cars, religion) %>%
#Some renamings
rename(gender=resp_gender, educ=education, immigrant=immigrant2, relig=religion)%>%
#Melt all variables other than ford and stress
gather(variable, category, -ford, -stress)%>%
#Group by all variables
group_by(variable, category, ford, stress) %>%
#filter out missings
filter(is.na(stress)==FALSE&is.na(ford)==FALSE)%>%
#filter out missings
filter(is.na(value)==FALSE)%>%
#summarize
summarize(freq=n())
#Show plots
ggplot(test, aes(x=as_factor(value), y=freq, group=as_factor(ford)))+geom_bar(stat='identity',position='dodge', aes(fill=as_factor(ford)))+facet_grid(~category, scales='free')
So, now all of the value labels for the variables that were melted have disappeared. So, the only way that I can see to prevent this is to individually use as_factor() to turn each labelled variable to a factor with the value labels as the factor levels. But, again, that is a lot of typing.
I guess my question is how to most efficiently to deal with the labelled class, turning them into factors, specifically as regards to ggplot2.