3

What's the best way to deal with too many factors in a plot, assuming that the factor variable is ordered? The default doesn't look nice:

ggplot(data.frame(x=factor(trunc(runif(10000, 0, 100)), ordered=T)), aes(x=x)) +
  geom_histogram()

ugly example

krlmlr
  • 22,030
  • 13
  • 107
  • 191
  • 2
    I would tilt the labels by 45-90 degrees. http://stackoverflow.com/questions/1330989/rotating-and-spacing-axis-labels-in-ggplot2 If that doesn't work, you can set manual breaks. See `scale_x_discrete`. http://docs.ggplot2.org/current/scale_discrete.html – Roman Luštrik Apr 11 '13 at 14:21
  • 2
    `coord_flip` is also a nice alternative sometimes when you run into this. – Justin Apr 11 '13 at 14:24

2 Answers2

4

You can flip the values.

ggplot(data.frame(x=factor(trunc(runif(10000, 0, 100)), ordered=T)), aes(x=x)) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  geom_histogram()

flip <- ggplot(data.frame(x=factor(trunc(runif(10000, 0, 100)), ordered=T)), aes(x=x)) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  geom_histogram()

If it's still too dense for your taste, you can set manual breaks. In this case, I use five.

prune <- ggplot(data.frame(x=factor(trunc(runif(10000, 0, 100)), ordered=T)), aes(x=x)) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  scale_x_discrete(breaks = seq(0, 100, by = 5)) +
  geom_histogram()

library(gridExtra)
grid.arrange(flip, prune)

enter image description here

Roman Luštrik
  • 64,404
  • 24
  • 143
  • 187
  • Sweet! I like especially the second option. Is there a way to infor the `100` inside the call? – krlmlr Apr 11 '13 at 14:37
  • Found it in the docs for [`discrete_scale`](http://docs.ggplot2.org/current/discrete_scale.html): `breaks=function(x) x[seq(1, length(x), by=5)]`. – krlmlr Apr 11 '13 at 14:46
  • The two plots look different, does setting `breaks` change what is actually displayed? – vashts85 May 06 '16 at 20:45
2

Use different method of visualization - dotplot(). You represent frequency by a single dot, and you move your factors to y axis to display it horizontally rather then vertically. This plus ordering gives you an easy visual indicator of frequency for each factor. It's a bit dense on the labels, but still shows you the factors if you zoom. Here is example with lattice

library(lattice)
d <- sort(table(factor(trunc(runif(10000, 0, 100)))))
dotplot(d, col=1, cex=0.5, scales = list(y = list(cex=0.5)))

enter image description here

But maybe what you want is something like factor frequency histogram, although I don't know what would you use it for. Just don't rotate x-axis labels, it makes it unreadable.

d <- factor(trunc(runif(10000, 0, 100)))
histogram(d, scales = list(x = list(at=seq(1,length(levels(dd)),5))))

enter image description here

Geek On Acid
  • 6,022
  • 3
  • 40
  • 60
  • The histogram is silly, it should just serve as an example for how the x axis can become too dense. -- How is your answer related to ggplot? – krlmlr Apr 11 '13 at 20:08
  • I thought you're asking for general R solution, not ggplot specific. You mean dotplot? It is indeed too dense, but thats because your data is a bit dense for factor-wise plotting like this... – Geek On Acid Apr 11 '13 at 22:24