3

I need to make a heatmap with the function 'pheatmap', using UPGMA and 1-pearson correlation as distance metric. My professor claims this is the default distance metric, although in my case it uses 'Euclidian' as distance metric. Is euclidian and 1 - pearson correlation the same or is he wrong? If he's wrong how can I use the correct distance metric for my heatmap?

My input

ph=pheatmap(avgreltlog10, color = colorRampPalette(rev(brewer.pal(n = 7, 
name = "RdYlBu")))(100), 
kmeans_k = NA, breaks = NA, border_color = "grey60",
cellwidth = 10, cellheight=10, scale = "none", cluster_rows=TRUE,
clustering_method = "average", cutree_rows = 4, cutree_cols= 2,)

R output

$tree_row

Call:
hclust(d = d, method = method)

Cluster method   : average 
Distance         : euclidean 
Number of objects: 65 


$tree_col

Call:
hclust(d = d, method = method)

Cluster method   : average 
Distance         : euclidean 
Number of objects: 10 
Sam Vanbergen
  • 115
  • 1
  • 10
  • the method is passed to the function `pheatmap:::cluster_mat`, examining the source code, if you specify "correlation", then `d = as.dist(1 - cor(t(mat)))` – rawr Dec 18 '17 at 19:20
  • Could you be more elaborate? Right now I included `clustering_distance_rows = "correlation", clustering_distance_cols = "correlation"` as arguments in the ph() function. But I'm unsure where to put `d = as.dist(1 - cor(t(mat)))`. And what to do with `pheatmap:::cluster_mat`? – Sam Vanbergen Dec 19 '17 at 15:35
  • `pheatmap:::cluster_mat` is just an internal function. type it into your console. and you don't have to do anything with `d=...`, the function `pheatmap:::cluster_mat` does that for you – rawr Dec 19 '17 at 16:30

1 Answers1

4

You can check the default settings easily by typing the function name without () in your terminal

>pheatmap

If you do that you can see that euclidean is used as default:

... clustering_distance_rows = "euclidean", clustering_distance_cols = "euclidean", clustering_method = "complete", ...

To use 1-pearson correlation, simply specify it as such:

cluster_rows = TRUE,
clustering_distance_rows = "correlation"

It works because, once again, if you dig into the code you can see that it calls for cluster_mat, which does this:

cluster_mat = function(mat, distance, method){
...
    if(distance[1] == "correlation"){
        d = as.dist(1 - cor(t(mat)))
    }
...

More info in the official document. There are so many packages around that it's not uncommon to mix things up :)

SplitInf
  • 66
  • 4