5

I am trying to create a pairs plot of 6 data variables using ggplot2 and colour the points according to the k-means cluster they belong to. I read the documentation of the highly impressive 'GGally' package as well as an informal fix by Adam Laiacano [http://adamlaiacano.tumblr.com/post/13501402316/colored-plotmatrix-in-ggplot2]. Unfortunately, I could not find any way to get the desired output in either.

Here is a sample code:-

#The Swiss fertility dataset has been used here

data_ <- read.csv("/home/tejaskale/Ubuntu\ One/IUCAA/Datasets/swiss.csv", header=TRUE)
data_ <- na.omit(data_)

u <- c(2, 3, 4, 5, 6, 7)
x <- data_[,u]
k <- 3
maxIterations <- 100
noOfStarts <- 100
filename <- 'swiss.csv'

library(ggplot2)
library(gridExtra)
library(GGally)

kmeansOutput <- kmeans(x, k, maxIterations, noOfStarts)

xNew <- cbind(x[,1:6], as.factor(kmeansOutput$cluster))
names(xNew)[7] <- 'cluster'
kmeansPlot <- ggpairs(xNew[,1:6], color=xNew$cluster)

OR

kmeansPlot <- plotmatrix(xNew[,1:6], mapping=aes(colour=xNew$cluster))

Both plots are created but aren't coloured according to clusters.

Hope I haven't missed an answer to this question on the forum and apologize if that is indeed the case. Any help would be highly appreciated.

Thanks!

Gavin Simpson
  • 157,540
  • 25
  • 364
  • 424
tejas_kale
  • 553
  • 2
  • 5
  • 20
  • 1
    You can do that with the normal plot command as well by passing the clusterIDs in the `col` parameter. – Thomas Jungblut Jul 16 '12 at 12:38
  • Thanks for the answer, @ThomasJungblut. But I am not sure I completely understand it. Are you advising the use of facets? I tried playing with facet_grid using examples given on http://stackoverflow.com/questions/1313954/plotting-two-vectors-of-data-on-a-ggplot2-scatter-plot-using-r. They are not serving my purpose though. A minimal example would be of immense help for me to better understand your suggestion. Thanks once again! – tejas_kale Jul 16 '12 at 13:36
  • It is just a normal scatter plot of your points colored by the clusters. See the normal kmeans doc here: http://stat.ethz.ch/R-manual/R-devel/library/stats/html/kmeans.html down at the bottom: `plot(x, col = cl$cluster)` where `cl$cluster` is the assignment to a cluster. – Thomas Jungblut Jul 16 '12 at 13:38
  • Okay, understood what you meant. But I am looking to generate this plot using 'ggplot2' and I don't think I can substitute 'plot' with 'qplot' here. Any idea how do I go about this using 'ggplot2'? – tejas_kale Jul 16 '12 at 14:14

1 Answers1

4

The following slight modification of plotmatrix2 works fine for me:

plotmatrix2 <- function (data, mapping = aes())
{
    grid <- expand.grid(x = 1:ncol(data), y = 1:ncol(data))
    grid <- subset(grid, x != y)
    all <- do.call("rbind", lapply(1:nrow(grid), function(i) {
        xcol <- grid[i, "x"]
        ycol <- grid[i, "y"]
        data.frame(xvar = names(data)[ycol], yvar = names(data)[xcol], 
            x = data[, xcol], y = data[, ycol], data)
    }))
    all$xvar <- factor(all$xvar, levels = names(data))
    all$yvar <- factor(all$yvar, levels = names(data))
    densities <- do.call("rbind", lapply(1:ncol(data), function(i) {
        data.frame(xvar = names(data)[i], yvar = names(data)[i], 
            x = data[, i])
    }))
    densities$xvar <- factor(densities$xvar, levels = names(data))
    densities$yvar <- factor(densities$yvar, levels = names(data))
    mapping <- defaults(mapping, aes_string(x = "x", y = "y"))
    class(mapping) <- "uneval"
    ggplot(all) + facet_grid(xvar ~ yvar, scales = "free") + 
        geom_point(mapping, na.rm = TRUE) + stat_density(aes(x = x, 
        y = ..scaled.. * diff(range(x)) + min(x)), data = densities, 
        position = "identity", colour = "grey20", geom = "line")
}


plotmatrix2(mtcars[,1:3],aes(colour = factor(cyl)))

enter image description here

It may be a ggplot2 version issue, but I had to force the faceting variables in the densities data frame to be factors (that seems broken to me even in the GGally version). Also, generally don't pass vectors to aes(), but simply column names.

joran
  • 157,274
  • 30
  • 404
  • 439
  • 1
    this worked for me though still trying to understand the functioning of the code from 'defaults' onwards. also, thanks for the tip regarding 'aes()'. – tejas_kale Jul 20 '12 at 06:16
  • With the most recent ggplot2 version (iirc 0.9.3.1), this produces `could not find function "defaults"`. – bluenote10 Nov 28 '14 at 15:29