1

http://imgur.com/IfVyu6f

I thought it would be something called cumulative and found cumulative frequency graph and cumulative flow diagram. However, I don't think neither is the graph in the image because cumulative graphs start from 0, but my variables do not. Also, density plots sounds the closest, but it's a distribution over the area of 1, but I want to show the frequencies.

Basically, the variables are sub-part of the main variable, and I want to show when these sub-variable converge to create a peak. In essence, these variables sum to show a cumulative bound.

THIS USER NEEDS HELP
  • 2,499
  • 2
  • 27
  • 50
  • It's called a **stacked area** chart. See this answer here: http://stackoverflow.com/questions/4651428/making-a-stacked-area-plot-using-ggplot2 – neerajt Sep 24 '15 at 06:24
  • Thank you. It seems really close to what I need. But instead of density, I need Y axis to display frequencies. – THIS USER NEEDS HELP Sep 24 '15 at 06:27
  • maybe this can help http://stackoverflow.com/questions/18519243/ggplot-legend-key-color-and-items – Keniajin Sep 24 '15 at 06:29
  • If you give us a general idea of what your data look like, you'll get a better answer. In the meantime, the answer @Keniajin provided will work for displaying frequencies/counts. – neerajt Sep 24 '15 at 09:00
  • @neerajt So the data is a crime frequencies, and I want to divide the crimes into several categories (assault, robbery, sex crime, etc) and plot each crime into the stacked area chart, so that the reader has clear understanding of which crime peaks at which time and when these crime happen the most altogether – THIS USER NEEDS HELP Sep 24 '15 at 23:29

2 Answers2

2

Using ggplot2 you can use the geom_area() function

library(ggplot2)
library(gcookbook) # For the data set

ggplot(uspopage, aes(x=Year, y=Thousands, fill=AgeGroup)) + geom_area()
neerajt
  • 263
  • 2
  • 8
Keniajin
  • 1,499
  • 2
  • 20
  • 38
  • May I ask what gcookbook is used for? If I were to use ggplot2, I should do `library(ggplot2)` and I should be able to use ggplot and geom_area, correct? – THIS USER NEEDS HELP Sep 24 '15 at 06:31
  • sure - the `gcookbook` contains data sets used in the book "R Graphics Cookbook" by Winston Chang. I used it to get the data `uspopage` – Keniajin Sep 24 '15 at 06:51
  • In this case, `uspopage` seems to have the frequency tabulated already. My data doesn't have the frequency already calculated. So one way I tried is make a new column with 1 and use func.y = sum, but it seems very hacky. Is there a way to use ggplot for data that doesn't have frequency column? – THIS USER NEEDS HELP Sep 25 '15 at 04:34
2

Thanks for sharing a little more about what your data look like.

Let's use the publicly available crime stats data from the Houston Police Department as an example. In this case, we're using the data set for the month of January, 2015.

library(ggplot2)

crime <- gdata::read.xls('http://www.houstontx.gov/police/cs/xls/jan15.xls')

# There's a single case in there where the offense type is called '1',
# that doesn't make sense to us so we'll remove it.
crime <- crime[!crime$Offense.Type == '1', ]
crime$Offense.Type <- droplevels(crime$Offense.Type)

There are 10 columns, but the ones we're interested in look like this:

# Hour Offense.Type
# 8   Auto Theft
# 13  Theft
# 5   Auto Theft
# 13  Theft
# 18  Theft
# 18  Theft

As you mentioned, the problem is that each row is a single incident. We need a way to get frequencies on a per hour basis to pass to geom_area().

The first way is to let ggplot2 handle it, no need to preformat the data.

p <- ggplot(crime, aes(x=Hour, fill=Offense.Type)) 
p + geom_area(aes(y = ..count..), stat='density')

ggplot density method

The other way is to preformat the frequencies table, using R's table() and reshape2's melt():

library(reshape2)
crime.counts <- table(crime$Hour, crime$Offense.Type)
crime.counts.l <- melt(crime.counts,
                        id.vars = c('Hour'),
                        value.name = "NumberofCrimes")

names(crime.counts.l) <- c("Hour", "Offense.Type", "numberOfCrimes")
p <- ggplot(crime.counts.l, aes(x = Hour,
                                 y = numberOfCrimes,
                                 fill = Offense.Type))
p + geom_area()

preformatted table method

neerajt
  • 263
  • 2
  • 8
  • THANK YOU SO MUCH! This is exactly what I have been looking for. I just have one more question though. Say I want to just plot the data for three types: "Aggrevated Assault", "Murder", "Rape". How would I achieve that? – THIS USER NEEDS HELP Sep 25 '15 at 05:54
  • [See this answer here](http://stackoverflow.com/questions/1195826/drop-factor-levels-in-a-subsetted-data-frame). Look at the second answer (not the checkmarked one). Short answer is you'll want to use subset and droplevels. **subset()** does what you expect and **droplevels()** gets rid of the unused labels so they don't end up in your plot legend or messing it up somehow. – neerajt Sep 25 '15 at 06:25