0

I struggling with ggplot2 despite finding quite similar question I didn't manage to get it works. I want to reorder by column and row a heatmap based on a hierachical clustering.

here my actual code:

# import
library("ggplot2")
library("scales")
library("reshape2")

# data loading
data_frame = read.csv(file=input_file, header=TRUE, row.names=1, sep='\t')

# clustering with hclust on row and on column
dd.col <- as.dendrogram(hclust(dist(data_frame)))
dd.row <- as.dendrogram(hclust(dist(t(data_frame))))

# ordering based on clustering
col.ord <- order.dendrogram(dd.col)
row.ord <- order.dendrogram(dd.row)


# making a new data frame reordered 
new_df = as.data.frame(data_frame[col.ord, row.ord])
print(new_df)   # when mannualy looking new_df it seems working 

# get the row name
name = as.factor(row.names(new_df))

# reshape
melte_df = melt(cbind(name, new_df))

# the solution is here to reorder the name column factors levels.
melte_df$name = factor(melte_df$name, levels = row.names(data_frame)[as.vector(row.ord)])

# ggplot2 dark magic
(p <- ggplot(melte_df, aes(variable, name)) + geom_tile(aes(fill = value),
 colour = "white") + scale_fill_gradient(low = "white",
 high = "steelblue") + theme(text=element_text(size=12),
 axis.text.y=element_text(size=3)))

# save fig
ggsave(file = "test.pdf")

# result is ordered as only by column what I have missed?

I am quite a newby with R if you can develop your answer you will be welcome.

RomainL.
  • 870
  • 1
  • 9
  • 21

1 Answers1

1

Without an example dataset to reproduce, I'm not 100% sure that's the reason, but I would guess that your problem relies at this line:

name = as.factor(row.names(new_df))

When you use a factor, the ordering is based on the ordering of the levels of that factor. You can reorder your data frame as much as you want, the order used when plotting will be the order of your levels.

Here's an example:

data_frame <- data.frame(x = c("apple", "banana", "peach"), y = c(50, 30, 70))
data_frame
       x  y
1  apple 50
2 banana 30
3  peach 70

data_frame$x <- as.factor(data_frame$x) # Make x column a factor

levels(data_frame$x) # This shows the levels of your factor
[1] "apple"  "banana" "peach" 

data_frame <- data_frame[order(data_frame$y),] # Order by value of y
data_frame
   x  y
2 banana 30
1  apple 50
3  peach 70

# Now let's plot it:
p <- ggplot(data_frame, aes(x)) + geom_bar(aes(weight=y))
p

This is the result:

example-result

See? It's not ordered by the y value as we wanted. It's ordered by the levels of the factor. Now, if that's indeed where your problem lies, there are solutions here R - Order a factor based on value in one or more other columns.

An applied example of the solution with dplyr :

library(dplyr)
data_frame <- data_frame %>%
       arrange(y) %>%          # sort your dataframe
       mutate(x = factor(x,x)) # reset your factor-column based on that order

data_frame
       x  y
1 banana 30
2  apple 50
3  peach 70

levels(data_frame$x) # Levels of the factor are reordered!
[1] "banana" "apple"  "peach" 

p <- ggplot(data_frame, aes(x)) + geom_bar(aes(weight=y))
p

This is the result now:

enter image description here

I hope this helps, otherwise, you might want to give an example of your original dataset!

agatheblues
  • 219
  • 1
  • 9
  • Your answer where really usefull to point out the problems. But in the end I find a more convenient way. by reordering the factors levels. I will edit my question to add what make it works but thanks again for your help. – RomainL. Aug 07 '17 at 11:33