1

I am just learning R and trying to reproduce something that I can easily create in Excel via a PivotTable. I have the data below that lists state names and their status. I want to make a horizontal bar chart that shows the state name on the Y axis and the percentage below on the X axis.

state_name status
State 1 above
State 1 above
State 1 below
State 1 below
State 1 below
State 1 above
State 1 below
State 1 below
State 1 below
State 1 above
State 2 above
State 2 NA
State 2 NA
State 2 NA
State 2 NA
State 3 below
State 3 above
State 3 above
State 3 above
State 3 below
State 3 above
State 3 below
State 3 below
State 3 above

I can load the data but am not sure how to write the code to subset and create percentages.

Here is my poor attempt,

ggplot(data = subset(data, !is.na(status)), aes(y=state_name, x=count(status[below])/count(status))) +
  geom_bar(stat="identity")

Any help would be greatly appreciated. I learn best through examples.

maydin
  • 2,421
  • 3
  • 7
  • 24
Anthony Schmidt
  • 419
  • 2
  • 8

2 Answers2

2

You can use prop.table for getting the percantages as,

data_perc <- as.data.frame(prop.table(table(data), 1))
data_perc <- data_perc[data_perc$status=="below",]


ggplot(data= data_perc, aes(x=state_name,y= Freq ,fill=state_name)) +
  geom_bar(stat="identity") + 
  coord_flip() +
  ggtitle("My Bar Chart")

gives,

enter image description here

Data:

data <- read.table(text="state_name status
State1 above
State1 above
State1 below
State1 below
State1 below
State1 above
State1 below
State1 below
State1 below
State1 above
State2 above
State2 NA
State2 NA
State2 NA
State2 NA
State3 below
State3 above
State3 above
State3 above
State3 below
State3 above
State3 below
State3 below
State3 above",header=T)
maydin
  • 2,421
  • 3
  • 7
  • 24
  • 1
    Nice answer, maydin. I learned two things - how to read in data as provided and how to use data.table. I'm leaving my answer just for the variety of using a pipe operator and dplyr::summarise. – markhogue Aug 23 '19 at 20:35
  • This is great. How would you remove the NA values? – Anthony Schmidt Aug 26 '19 at 12:35
  • @AnthonySchmidt Since I filtered the data wrt the **below**, no action needed to remove the `NA` values. However, if you asked it in general, you can have a look at [this](https://stackoverflow.com/questions/4862178/remove-rows-with-all-or-some-nas-missing-values-in-data-frame) question. – maydin Aug 28 '19 at 18:44
1

I saved your data as, e.g. state_1, etc, and loaded it:

states <- read.table("c:/R_files/SO.dat", header = TRUE)
library(ggplot2)
library(dplyr)
ggplot(states, aes(state_name, status)) + geom_col() + coord_flip()

states %>% 
  group_by(state_name) %>% 
  summarise(pct = 100 * length(which(status=="below"))/length(status)) %>% 
ggplot(aes(x = state_name,
           y = pct)) +  geom_col(fill = "blue") + coord_flip()

enter image description here

markhogue
  • 791
  • 3
  • 12