How to do a DGE analysis of a list of 1000 genes of interest?
3
2
Entering edit mode
4.4 years ago
nattzy94 ▴ 60

I am doing a DGE analysis of a total RNAseq dataset of 2 timepoints (5 reps each). I am particularly interested in looking for changes in expression of 1000 genes.

Currently, I have done the analysis by analysing all genes and then picking the 1000 genes I am interested in. However, my PI has suggested that I could try doing the the analysis by doing DGE on just the 1000 genes. Theoretically, this should improve the statistical significance since there would be minimal adjustments for multiple hypothesis testing.

Is this an advisable way of doing the analysis? Since differential expression levels are fit to a negative binomial distribution (in the case of DESeq2), wouldn't this just mean most of the 1000 genes I input would end up not being differentially expressed?

Edit: We arrived at the list of 1000 genes as we were interested particularly in genes coding small proteins. Hence, we searched Uniprot for human proteins with a maximum length of 100 amino acids.

RNA-Seq R • 1.1k views
ADD COMMENT
3
Entering edit mode
4.4 years ago

Is this an advisable way of doing the analysis?

In my opinion, it is not advisable. I would use the entire dataset and then check the p-values of your genes of interest, while being open to other genes that may be statistically significant, too.

Prior to normalisation, you can, of course, rigorously filter your dataset for low-count genes.

Kevin

ADD COMMENT
1
Entering edit mode
4.4 years ago
Papyrus ★ 3.0k

In my opinion this is not an advisable way of doing the analysis. The main problem is how one arrives at the list of interest. In your case, it seems that these 1000 genes were selected a posteriori by their statistical significance and not "biological" reasons. So for me it is hardly justifiable.

ADD COMMENT
0
Entering edit mode

Thanks for the reply Papyrus.

The list of 1000 genes was compiled by searching for small proteins. We searched the Uniprot database for human proteins of max. length 100 amino acids. Since we are only interested in small proteins in the analysis, would this be a sufficient reason?

ADD REPLY
0
Entering edit mode

Since we are only interested in small proteins in the analysis, would this be a sufficient reason?

No. You could have done a different type of experiment if you really wanted to just focus on those 1000 genes. However, you chose RNA-seq and you therefore should stick to conventions in RNA-seq.

ADD REPLY
0
Entering edit mode

I would preferably do pathway enrichment analysis on the whole DEG results to see if among your list of differentially expressed genes there is an enrichment in small proteins. In general, you may perform pathway-focused analyses (such as GSEA) to see how specific pathways behave in your data.

ADD REPLY
0
Entering edit mode
4.4 years ago

Don't filter up front, if only so that you can use data from all the genes for library normalization and dispersion estimates.

You can filter your results list afterwards, if you really want.

ADD COMMENT

Login before adding your answer.

Traffic: 2152 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6