Question

RNA-seq differential expression : relevance of mixing several datasets

0

Entering edit mode

5.4 years ago

guillaume.rbt ★ 1.0k

Hi all,

I'm currently doing differential expression analysis on several public RNAseq datasets coming from different studies.

Each one having samples with two conditions : "responders" and "non responders".

I'm wondering if I should analyse each dataset separately or if I should mix all samples in one meta-dataset, while correcting for batch effects.

When I analyse each dataset separately, I don't have any intersection of genes between each results. With that in mind, would it be possible to get differentially expressed genes if I mix all datasets? Or is it nonsense to expect that?

Thank for any input

RNA-Seq differential expression • 1.3k views

ADD COMMENT • link updated 5.4 years ago by leaodel ▴ 190 • written 5.4 years ago by guillaume.rbt ★ 1.0k

0

Entering edit mode

Zero overlaps between sets is concerning. How consistent are the experimental designs?

ADD REPLY • link 5.4 years ago by jared.andrews07 ★ 18k

0

Entering edit mode

The experimental designs are quite consistent on the paper, but as it is tumor biopsies samples, I'm afraid the different procedures associated with samples treatment can bring a lot of noise.

ADD REPLY • link 5.4 years ago by guillaume.rbt ★ 1.0k

score 1 · Answer 1 · 2019-07-17

You should process the data separately and once you have the expression matrix you can use methods like PCA to check if the variability of your data is being driven by the condition of interest. If this is not the case, you have a batch effect on your dataset. If you know what is causing the batch effect you can remove the batch effect using limma function removeBatchEffect() or model this batch effect in the design formula if you're using DESeq2 (the latter is indicated for DESeq2 DE analysis). If you don't know what is causing the batch effect you can use a tool like sva to find surrogate variables that will represent batch variables and either model them during your DE analysis or correct for them before your analysis, depending on which tool you'll use for the DE test.