Question

Problem in hierarchical clustering

0

Entering edit mode

5.4 years ago

Calangoa ▴ 30

Hi there, I have a problem with my hierarchical clustering method and I appreciate if anyone could help me in advance. Let me start from the first step, in order to identify differentially expressed genes in some microarray studies (each study consist of 3 individual dataset, collectively I have 15 dataset) I use limma package from bioconductor, R. Then I filtered out those genes with adj. P-value less than 0.05. After that, I extracted a set of genes which involved in the cell cycle for example. Finally, this set of genes with there expression base on log fold change were used for hierarchical clustering. As I read before for log-transformed data Euclidean distance measurement method with complete linkage is the best for my data but the problem is when I clustered 15 dataset, surprisingly data from the same study stand close together in one cluster. What can I do for this mistaken view? Would it possible to use only one control for all treatment data from a different study in R? Or another approach would be taken?

Many thanks in advance

clustring microarray • 2.5k views

ADD COMMENT • link updated 5.4 years ago by leaodel ▴ 190 • written 5.4 years ago by Calangoa ▴ 30

1

Entering edit mode

Can you show the design matrix, and especially if and how you checked and/or compensated for potential batch effects?

ADD REPLY • link 5.4 years ago by ATpoint 85k

0

Entering edit mode

Here is the photo of heirarchical clustring

I think my mistake is I dont consider the batch effect. I normiliza each study separetly then I clustered them together. How can I compensate batch effect? In what way? Would it a good idea to normiliza all datasets together? But I dont know how could it possible. Any suggestion?

ADD REPLY • link updated 5.4 years ago by Ram 44k • written 5.4 years ago by Calangoa ▴ 30

0

Entering edit mode

Please edit this post and see the changes I've made to see how to add images properly.

Images should be added using the image button, not the link button. You'll need the direct link to the image, not the link to the page hosting the image.

ADD REPLY • link 5.4 years ago by Ram 44k

0

Entering edit mode

If you normalize separately then this result is totally normal and expected as the datasets of the single studies are only scaled within the study but not to each other. If you do z-scoring then you at least have to normalize them all together, not discussing if comparing values from different studies makes sense due to the batch effect.

ADD REPLY • link 5.4 years ago by ATpoint 85k

0

Entering edit mode

I know, but I want to normalize them to compensate batch effect and to find which data is close to CM without considering what dataset is belong to which study. Any way?!

ADD REPLY • link 5.4 years ago by Calangoa ▴ 30

score 2 · Answer 1 · 2019-07-17

2

Entering edit mode

5.4 years ago

leaodel ▴ 190

If you have a known batch effect and plan to visualize your data you'll need to correct the log-transformed data for this batch effect. I use limma::removeBatchEffect.

ADD COMMENT • link 5.4 years ago by leaodel ▴ 190

0

Entering edit mode

No I dont know, I just want to. After clustring I found that different daraset from one study stand close together in one cluster but it is not correct when they compared with CM data. How can I do metaanalysis and normilize microarray data from different study?

ADD REPLY • link 5.4 years ago by Calangoa ▴ 30

1

Entering edit mode

So a batch effect is not something that you'll correct by means of normalization. You have to use a method designed to measure the variance attributed to the batch variable and then correct for it. If you have a hidden batch effect you can use sva.

The sva package can be used to remove artifacts in three ways: (1) identifying and estimating surrogate variables for unknown sources of variation in high-throughput experiments (Leek and Storey 2007 PLoS Genetics,2008 PNAS), (2) directly removing known batch effects using ComBat (Johnson et al. 2007 Biostatistics) and (3) removing batch effects with known control probes (Leek 2014 biorXiv).

Once the batch effect is removed, you can proceed to the hierarchical clustering.

ADD REPLY • link 5.4 years ago by leaodel ▴ 190

0

Entering edit mode

Calangoa, if the answer was helpful to solve your problem, please accept it as an answer.

ADD REPLY • link 5.4 years ago by leaodel ▴ 190