Alternatives to TMM normalization
2
1
Entering edit mode
2.7 years ago
Jautis ▴ 580

Hello, I have RNAseq data with a major perturbation where we expect a large number of genes (>50% of the dataset) to be differentially expressed. I'm aware that this can be a problem for TMM normalization which assumes that the majority of genes are not DE between samples. However, I haven't seen any other approaches suggested for when this assumption isn't met.

I suspect it may be necessary in our data because, in addition to the large perturbation, we observe systematic differences in the normalization factors between conditions that are independent of library size. This leads to sign-changes where the cpm data show genes as down-regulated but the normalized data show the the genes as up-regulated. I understand this could be produced by outlier genes (what TMM is trying to correct for), but given the genome-wide differences I'm unsure if applying the correction here is appropriate.

Any advice or suggestions regarding a better way to normalize datasets with a major perturbation?

voom expression R limma • 1.0k views
ADD COMMENT
3
Entering edit mode
2.7 years ago
ATpoint 85k

You can always inspect the MA-plot to judge how normalization goes. If you have a set of control genes that you can confidently identify as probably non-DE then you can just run the TMM calculation on these and then feed this back to the full dataset. After all the edgeR procedure is to calculate CPM based on library size and then correct this for composition with these TMMs. Or you can try to use only genes with large logCPMs as these usually tend to be rather non-DE. Can you show an MA-plot?

ADD COMMENT
1
Entering edit mode

This is a great (AT)point. No need to go all the way to pathway analysis (4. in my answer below) when routine aspects of the QC pipeline should be giving related information, so long as the analyst is comfortable interpreting it.

ADD REPLY
0
Entering edit mode

Awesome, thank you both for the advice!

ADD REPLY
2
Entering edit mode
2.7 years ago
LauferVA 4.5k

1. TL; DR:

While I would definitely still try RLE, my hunch here would be for MRN (see below quotation).

"Within this group the MRN method is less sensitive to the modification of parameters related to the relative size of transcriptomes such as the number of down- and upregulated genes and the gene expression levels. The newly proposed MRN method efficiently deals with intrinsic bias resulting from relative size of studied transcriptomes."

2. Leverage available reviews comparing performance of these algorithms

Of course, there is no reason to blindly guess or speculate such as I have done above. Rather, you can run all the methods and compare for yourself, and, in addition, you have literature to fall back on. This manuscript is a bit old, but it should give you some ideas. It also cites additional manuscripts containing simulations in which one or another method performs best.

3. A meta-analytic approach to limit the influence of this signature on the results

Do you have additional data processed in a similar way? In particular identically generated data differing only in the condition that produces the perturbation would be exceedingly valuable here ... If the this signature you speak of is strong enough, you might get a fairly strong kick to statistical power by including other (very similar) samples you have, and processing them all at the same time. I've gotten remarkable increases in statistical / discriminative power by doing this before...

4. WGCNA and/or pathway analysis to sanity check the results you obtain

If, at the end of everything, you're not sure how well you were able to compensate for the perturbation, consider assessing your results by looking at the pathways you are most certain should behave in a certain way. Ultimately, even if you're worried about normalization, if everything makes sense at the end and it helps teach you more about your biology, it may not have mattered.

ADD COMMENT

Login before adding your answer.

Traffic: 1744 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6