Question

pearson correlation or Euclidean distance for clustering?

1

Entering edit mode

4.9 years ago

mrashad ▴ 80

I have a matrix of multi omics expression and need to make a clustering using Hierarchical clustering and k means but confused between the used distance Euclidean distance or Pearson correlation.

Is there any guide for which one of them should be used in expression data?

gene-expression • 9.3k views

ADD COMMENT • link updated 2.4 years ago by Ram 45k • written 4.9 years ago by mrashad ▴ 80

score 5 · Answer 1 · 2020-09-10

5

Entering edit mode

4.9 years ago

Kevin Blighe 89k

There is neither a guide nor standard for this.

If using either of Euclidean distance or Pearson correlation, your data should follow a Gaussian / normal (parametric) distribution. So, if coming from a microarray, anything from RMA normalisation is fine, whereas, if coming from RNA-seq, any data deriving from a transformed normalised count metric should be fine, such as variance-stabilised, regularised log, or log CPM expression levels.

If you are performing clustering on non-normal data, like 'normalised' [non-transformed] RNA-seq counts, FPKM expression units, etc., then use Spearman correlation (non-parametric).

As usual, get intimate with your data, know its distribution, and thereafter choose the appropriate method(s).

Kevin

ADD COMMENT • link 4.9 years ago by Kevin Blighe 89k

3

Entering edit mode

A good point to rise is data distribution importance for choosing distance measures in clustering analysis. Thanks This is my understanding of differences between Euclidean distance or Pearson correlation distances application for gene expression clustering: When we are interested in considering overall expression profiles (up and down), correlation-based measures (i.e. Pearson correlation) would be of choice. In other cases, we may want to cluster observations with the same magnitude of dysregulation together. In this way observations with high value of features would cluster together. In these cases, Euclidean distance would be our choice for dissimilarity matrix calculation.

ADD REPLY • link 3.5 years ago by Hamid Ghaedi 3.3k

0

Entering edit mode

Thanks a lot I got it

ADD REPLY • link 4.9 years ago by mrashad ▴ 80

1

Entering edit mode

I got it, thanks a lot for this fruitful answer.

ADD REPLY • link 4.9 years ago by mrashad ▴ 80