Differential expression for two very different samples
3
4
Entering edit mode
7.4 years ago
I0110 ▴ 160

Standard tools for differential expression analysis tools (e.g. edgeR and DESeq2) assume that most genes in the samples are equally expressed, and only a small fraction of genes are differentially expressed. I was wondering how we can compare two very different RNA samples. For example, one from muscle and the other from liver. I know some people just use a more stringent criterion (e.g. 4-fold difference and FDR <0.001). Is there a more statistically sound way to do the analysis? Thanks!

RNA-Seq Statistics • 4.9k views
ADD COMMENT
0
Entering edit mode

Standard tools for differential expression analysis tools (e.g. edgeR and DESeq2) assume that most genes in the sample are equally expressed, and only a small fraction of genes are differentially expressed.

Are you sure about that?

ADD REPLY
2
Entering edit mode

e.g. For example: "Still, it is important to keep in mind that even these methods are based on an assumption that most genes are equivalently expressed in the samples." from https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-14-91

ADD REPLY
0
Entering edit mode

Thanks for the citation, I appreciate it! I won't actually believe it until I see it stated by the writers of the tools, but it doesn't seem unlikely.

That said - don't consider my opinion here to be authoritative, but people use those tools all the time for differential expression analysis between tissue types. RNA-seq always seems to be unpredictable and hard to reproduce, though, so I'm not really sure how you would validate that an approach is working correctly.

ADD REPLY
0
Entering edit mode

It's indeed an assumption of DESeq2 and similar tools. Now I'm trying to find a reference for that too...

ADD REPLY
0
Entering edit mode

Please correct me if I am wrong. I guess it is difficult to get "normalized counts" for very different samples. Indeed, most people just go head use these tools with different organs or tissues, but I just wonder if there is a better way. :-) Another way to think about this, maybe it is meaningless to analyze differentially expressed genes between tissues since they are already too different.

ADD REPLY
1
Entering edit mode

I guess you could normalize to a a priori selected set of housekeeping genes as "stable background".

ADD REPLY
0
Entering edit mode

I know people use a selected set of housekeeping genes as controls for qPCR analysis. Could we do that in RNA-Seq analysis? Could you provide a reference for that? Thanks!

ADD REPLY
4
Entering edit mode
7.4 years ago
h.mon 35k

Although the authors state that most of the genes should not be differentially expressed, I think (and remember reading from one of the authors from one of those packages on some forum) the packages are robust to having a sizeable proportion of truly differentially expressed genes, as long there are also a lot of non-differentially expressed genes for parameter estimation.

For edgeR, you can adjust the proportion of tags used for parameter estimation, such as to alleviate problems arising from too many truly differentially expressed genes - see the discussion here. In short, use the parameter logratioTrim in the function calcNormFactors().

P.S.: you could probably get an answer from the packages authors at the Bioconductor support forum: https://support.bioconductor.org

ADD COMMENT
0
Entering edit mode

Thanks, h.mon! This link you suggested does provide a nice example.

ADD REPLY
2
Entering edit mode
7.4 years ago
Michele Busby ★ 2.2k

Since the main problem here would be drawing a line through the middle of the genes to normalize the two sets, if you are creating your own data you may want to spike in ERCCs. These are RNA sequences that you would put into each sample in the same quantity to assist with normalization later.

In truth, if most of the genes are expressed differently than the more stringent cutoff is probably more to do with prioritizing genes rather than normalization, i.e. what you can plan on doing with a list of 10K genes.

ADD COMMENT
0
Entering edit mode

It is a very good idea to prioritizing genes instead of normalizing genes. Thanks, Michele.

ADD REPLY
2
Entering edit mode
7.4 years ago
James Ashmore ★ 3.5k

Have a look at the R package called qsmooth and it's associated manuscript. It gives a very similar example to what you have mentioned in your question.

http://www.biorxiv.org/content/early/2016/11/03/085175

ADD COMMENT
0
Entering edit mode

Thanks so much, James! For others interested in this method, a user's guide can be found at https://github.com/stephaniehicks/qsmooth/blob/master/vignettes/qsmooth-vignette.pdf

ADD REPLY

Login before adding your answer.

Traffic: 2674 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6