Question

Clustering and dynamic tree cutting

1

Entering edit mode

23 months ago

harish ▴ 10

I am trying to cut the dendrogram tree using the package dynamicTreeCut, I prefer dynamic cutting and clustering. I run the code below

clusDyn <- cutreeDynamic(hr, distM = as.matrix(as.dist(1-cor(t(scaledata)))), method = "hybrid")

However, it produces 160 clusters, which is too many to analyze each one of them individually. Is it possible to tell to cut tree dynamically but also to group them in such a way that it produces only a specific number of clusters? For example, I would like 20 clusters after the dynamic tree cut instead of 160 clusters.

I know that if I cut the dendrogram at a specific height then I could possibly decide the number of clusters it would generate but I prefer Dynamic tree cutting.

dynamicTreeCut Clustering data RNAseq cutreeDynamic • 2.4k views

ADD COMMENT • link updated 6 months ago by andres.firrincieli 3.8k • written 23 months ago by harish ▴ 10

0

Entering edit mode

it produces 160 clusters.

This is happening because the input is a simple correlation matrix that is affected by spurious or missing connections (see this paper).

ADD REPLY • link 23 months ago by andres.firrincieli 3.8k

0

Entering edit mode

I am very new to RNAseq analysis and clustering. Can you please elaborate on it, do you mean to say that Pearson correlation is not enough for this clustering and I should look for other methods? Is WGCNA a better workflow?

ADD REPLY • link 23 months ago by harish ▴ 10

0

Entering edit mode

Help me to understand. Is this a clustering analysis of differentially expressed genes or an unsupervised clustering analysis (eg WGCNA)?

ADD REPLY • link 23 months ago by andres.firrincieli 3.8k

0

Entering edit mode

These are differentially expressed genes, which are around 15K genes from a total of 30 K genes. Then I follow the clustering protocol as given in this link (the genes are scaled and then clustered by Pearson correlation)- https://2-bitbio.com/2017/04/clustering-rnaseq-data-making-heatmaps.html

ADD REPLY • link 23 months ago by harish ▴ 10

0

Entering edit mode

I don't think the cutreeDynamic function will work very well with a distance matrix calculated from pearson correlation values: as.matrix(as.dist(1-cor(t(scaledata)))). Just to be sure, how did you calculate hr (the link doesn't work for me)?

ADD REPLY • link 23 months ago by andres.firrincieli 3.8k

0

Entering edit mode

thank you for the effort, I did calculate the hr as you have shown. hr <- hclust(as.dist(1-cor(t(scaledata), method="pearson")), method="complete")

As it seems that Pearson correlation values do not work well with cutreeDynamic, can you please suggest something that I can look into, to make a better correlation matrix?

ADD REPLY • link 23 months ago by harish ▴ 10

0

Entering edit mode

can you please suggest something that I can look into, to make a better correlation matrix?

Look, I am not familiar with workflows used for the detection of clusters of differentially expressed genes. What I can tell you is that cutreeDynamic, with the default settings, doesn't work very well when the distance matrix is calculated just from pearson correlation values.

If you want to use cutreeDynamic, there are settings that you can change in oder to reduce number of clusters. For example, see: minClusterSize, deepSplit, cutHeight, and maxCoreScatter (usage)

ADD REPLY • link 23 months ago by andres.firrincieli 3.8k

0

Entering edit mode

Hi #andres.firrincieli,

Although it's late, hope to have your helpful answer. regarding cutreeDynamic, you recommended changing some settings like cutHeight. so, we have to determine cutHeight value even with cutreeDynamic. my understanding was we do not need to specify the cutHeight parameter explicitly for cutreeDynamic, it is not correct, right?

ADD REPLY • link 6 months ago by seta ★ 1.9k

1

Entering edit mode

I typically set the minimum cluster size to 100 and leave the others with the default settings.

ADD REPLY • link 6 months ago by andres.firrincieli 3.8k