is it correct to pre-select a set of genes to perform differential expression analysis using deseq2
2
4
Entering edit mode
7.0 years ago

is it correct to pre-select a set of genes to perform differential expression analysis using deseq2 For example, my first comparison would be tumoral vs non-tumoral tissue, and the set of genes I get (over 10.000 DE genes) I would use to compare for example, patients that recur vs patients that did not recur, using just that set of genes differentially express in the last comparison (tumoral vs non-tumoral)

RNA-Seq • 1.8k views
ADD COMMENT
2
Entering edit mode
7.0 years ago
ATpoint 85k

There was a similar quuestion recently, asking if removal of a large set of genes in order to save computional time is valid. Without being a statistician, purely based on my (naive) understanding of DESeq2, I assumed that any removal of (a large number of ) genes, or in your case subsetting to certain genes might violate the assumptions of DESeq2. In your case, the question is if the median ratio of the chosen genes will still capture the true size relationships between the datasets (e.g. sequencing depth), as this is the basis for the normalization process. In other words, do the chosen genes allow to scale the different samples appropriately to each other. Why don't you choose the patients of interest based on the first analysis, assign factor levels to them, "recurr" / "non-recurr", rerun DESeq2 on the full set of genes and then check if your target genes come out as DE?

ADD COMMENT
1
Entering edit mode
7.0 years ago

Pre-selecting genes for differential expression based on differential expression is generally going to be challenging to justify if there is a nested design (samples overlap between test set #1 and test set #2). If these are two different datasets, then perhaps this can be more easily justified.

From a biological point-of-view, it is quite possible and believable that genes that are associated with recurrence are not differentially expressed between tumor and normal, so it is also quite possible that including only those "first" differentially expressed genes in a second comparison will lead to false negatives.

ADD COMMENT

Login before adding your answer.

Traffic: 1596 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6