Question

Use Spike-Ins or TMM-normalization

0

Entering edit mode

3.4 years ago

Barista ▴ 50

Hi all,

Sorry for all my questions lately, but as a novice which has to figure out how to analyse QuantSeq data, this forum has been a great and indispensable help for me.

I'm doing a human transcriptomics analysis where we have QuantSeq data for 600 human patients with a certain condition which is quite similar in a substantial amount of aspects but different in others. We have 300 patients in one group and 300 patients in the other and I already followed the edgeR manual doing TMM-normalization among other things. But now I noticed that there are also ERCC/SIRV Spike Inns in the dataset!

I did some literature research and found that normalization with Spike-Ins is a possibility, but in some cases show mixed performance and is not always as accurate. Furthermore, TMM seems to be the preferable way IF the assumptions are fulfilled (which are DE and non-DE genes behave the same and there is symmetric expression). But my problem is now, how can I be sure if those assumptions are fulfilled in my experiment? As said the two conditions of interest are comparable in certain aspects (lead to the same clinical syndrome), but have different etiologies. I tend to favour TMM in my experiment as I do expect some genes to be up/down regulated in both conditions. In other words I do not expect that samples from one condition to be totally different from the other.

Does anyone have some advice on this matter? Could I just continue as I did now (with the extra step to exclude the Spike-Ins) or are Spike-Ins greatly advised?

normalization edgeR • 1.3k views

ADD COMMENT • link updated 22 months ago by Ram 44k • written 3.4 years ago by Barista ▴ 50

score 2 · Answer 1 · 2021-08-23

2

Entering edit mode

3.4 years ago

ATpoint 86k

TMM is very robust in most situations. You need quite some global changes to break it. This is actually what the plotMD can be used for that you asked about before. Just run the DE analysis and make such a plot based on the topTags output, which visualizes the two groups in terms of the fold change to average expression ratio. You will easily see whether the assumptions hold (most likely they do). The MD (aka MA) plot will have the bulk of genes centered along y = 0 and then have the typical arrowhead-like shape. If the bulk is centered somewhat at y=0 you're fine. There is most likely no need for any spike controls. You can even do it manually by plotting logCPM on x-axis and logFC on y-axis. Just follow the edgeR vignette, e.g. page 57/58.

ADD COMMENT • link 3.4 years ago by ATpoint 86k

0

Entering edit mode

Ah that totally makes sense. With plotting the topTags object I assume you mean plotting the object generated with the glmQLFTest (which I named glf)?

This shows the typical arrowhead shape where there seems to be a symmetrical distribution around the x-axis, is that how you interpret this correctly?

MDplot of glf-object

ADD REPLY • link 3.4 years ago by Barista ▴ 50

0

Entering edit mode

Yeah, that looks very normal. You should be good to go with these results in terms of normalization.

ADD REPLY • link 3.4 years ago by ATpoint 86k