Hi all,
Sorry for all my questions lately, but as a novice which has to figure out how to analyse QuantSeq data, this forum has been a great and indispensable help for me.
I'm doing a human transcriptomics analysis where we have QuantSeq data for 600 human patients with a certain condition which is quite similar in a substantial amount of aspects but different in others. We have 300 patients in one group and 300 patients in the other and I already followed the edgeR manual doing TMM-normalization among other things. But now I noticed that there are also ERCC/SIRV Spike Inns in the dataset!
I did some literature research and found that normalization with Spike-Ins is a possibility, but in some cases show mixed performance and is not always as accurate. Furthermore, TMM seems to be the preferable way IF the assumptions are fulfilled (which are DE and non-DE genes behave the same and there is symmetric expression). But my problem is now, how can I be sure if those assumptions are fulfilled in my experiment? As said the two conditions of interest are comparable in certain aspects (lead to the same clinical syndrome), but have different etiologies. I tend to favour TMM in my experiment as I do expect some genes to be up/down regulated in both conditions. In other words I do not expect that samples from one condition to be totally different from the other.
Does anyone have some advice on this matter? Could I just continue as I did now (with the extra step to exclude the Spike-Ins) or are Spike-Ins greatly advised?
Ah that totally makes sense. With plotting the topTags object I assume you mean plotting the object generated with the glmQLFTest (which I named glf)?
This shows the typical arrowhead shape where there seems to be a symmetrical distribution around the x-axis, is that how you interpret this correctly?
Yeah, that looks very normal. You should be good to go with these results in terms of normalization.