Hello!
I am currently analyzing with DESeq2 a group 32 ovine tumors against 3 healthy controls. 80% of the tumor samples are technical replicates. I tested the DESeq analysis with or without using the function collapseReplicates(dds, groupby= colData(dds)$Sample, renameCols=T)
.
In both cases the MA plot had a weird shape looking like this (+PCA and dispersion plots):
Moreover, 272 (for the no collapsing condition) and 695 (for the collapsing condition) genes have been flagged as outliers.
I have already done other DESeq analysis with the same code on smaller datasets and never seen such distribution. Could someone tell me if this is a normal distribution for tumor samples or for this amount of samples? If not, what can be the reasons and what can I do?
Thanks in advance!
Vincent
I think the reason why you have such an asymmetry in your MA plot is that you have a rather unbalanced design. you are comparing many heterogeneous tumor samples against a few very similar control samples. I guess there could also be a difference in depth between the samples, is it true?
Basically what I think it's happening is that for some reason it's much easier to call downregulation rather than upregulation in your comparison. is it right that the downregulated genes in your MA plot are more highly expressed in the tumor samples compared to the control, or is it the opposite? if the former is true, I guess that's because there are a lot of genes not expressed in the control, but expressed in some of the tumor samples at different levels. that's why you get some kind of lines on the left part of the plot.
In terms of sizeFactors they are ranging from 0.4094 to 2.5580 with a average of 1.14. So I don't think that the depth is really an issue.
For downregulation vs upregulation, indeed my controls have a lot of genes that are expressed at low level (1-10 reads) compared with the tumors (+10-1000). You can even see it by eye when scrolling in the normalized count table.
What do you think I could do to obtain a better shape of MA plot? Is it something common when analyzing tumors with DESeq?
Thanks for you suggestions. I will try to modify the design as suggested.
Concerning the sizefactors of the controls there are all around 1.2 so not really different from the majority of the tumors.
A difference between 0.4 and 2.5 in size factors is quite high. It means that there is a 6 fold difference in sequencing depth between two samples. Ao am I right in assuming that the control samples have a lower depth, compared to the tumor samples? Is this something correctly reflected by the size factors?