Question

How to do a DGE analysis of a list of 1000 genes of interest?

2

Entering edit mode

4.8 years ago

nattzy94 ▴ 60

I am doing a DGE analysis of a total RNAseq dataset of 2 timepoints (5 reps each). I am particularly interested in looking for changes in expression of 1000 genes.

Currently, I have done the analysis by analysing all genes and then picking the 1000 genes I am interested in. However, my PI has suggested that I could try doing the the analysis by doing DGE on just the 1000 genes. Theoretically, this should improve the statistical significance since there would be minimal adjustments for multiple hypothesis testing.

Is this an advisable way of doing the analysis? Since differential expression levels are fit to a negative binomial distribution (in the case of DESeq2), wouldn't this just mean most of the 1000 genes I input would end up not being differentially expressed?

Edit: We arrived at the list of 1000 genes as we were interested particularly in genes coding small proteins. Hence, we searched Uniprot for human proteins with a maximum length of 100 amino acids.

RNA-Seq R • 1.3k views

ADD COMMENT • link 4.8 years ago by nattzy94 ▴ 60

0

Entering edit mode

4.8 years ago

swbarnes2 14k

Don't filter up front, if only so that you can use data from all the genes for library normalization and dispersion estimates.

You can filter your results list afterwards, if you really want.

ADD COMMENT • link 4.8 years ago by swbarnes2 14k

score 3 · Accepted Answer · 2020-06-23

Is this an advisable way of doing the analysis?

In my opinion, it is not advisable. I would use the entire dataset and then check the p-values of your genes of interest, while being open to other genes that may be statistically significant, too.

Prior to normalisation, you can, of course, rigorously filter your dataset for low-count genes.

Kevin

score 1 · Accepted Answer · 2020-06-23

1

Entering edit mode

4.8 years ago

Papyrus ★ 3.0k

In my opinion this is not an advisable way of doing the analysis. The main problem is how one arrives at the list of interest. In your case, it seems that these 1000 genes were selected a posteriori by their statistical significance and not "biological" reasons. So for me it is hardly justifiable.

ADD COMMENT • link 4.8 years ago by Papyrus ★ 3.0k

0

Entering edit mode

Thanks for the reply Papyrus.

The list of 1000 genes was compiled by searching for small proteins. We searched the Uniprot database for human proteins of max. length 100 amino acids. Since we are only interested in small proteins in the analysis, would this be a sufficient reason?

ADD REPLY • link 4.8 years ago by nattzy94 ▴ 60

0

Entering edit mode

Since we are only interested in small proteins in the analysis, would this be a sufficient reason?

No. You could have done a different type of experiment if you really wanted to just focus on those 1000 genes. However, you chose RNA-seq and you therefore should stick to conventions in RNA-seq.

ADD REPLY • link 4.8 years ago by Kevin Blighe 89k

0

Entering edit mode

I would preferably do pathway enrichment analysis on the whole DEG results to see if among your list of differentially expressed genes there is an enrichment in small proteins. In general, you may perform pathway-focused analyses (such as GSEA) to see how specific pathways behave in your data.

ADD REPLY • link 4.8 years ago by Papyrus ★ 3.0k