Question

MA plots DESeq: strange MA plots

3

Entering edit mode

9.4 years ago

VHahaut ★ 1.2k

Hello!

I am currently analyzing with DESeq2 a group 32 ovine tumors against 3 healthy controls. 80% of the tumor samples are technical replicates. I tested the DESeq analysis with or without using the function collapseReplicates(dds, groupby= colData(dds)$Sample, renameCols=T).

In both cases the MA plot had a weird shape looking like this (+PCA and dispersion plots):

Dispersion

PCA

Moreover, 272 (for the no collapsing condition) and 695 (for the collapsing condition) genes have been flagged as outliers.

I have already done other DESeq analysis with the same code on smaller datasets and never seen such distribution. Could someone tell me if this is a normal distribution for tumor samples or for this amount of samples? If not, what can be the reasons and what can I do?

Thanks in advance!

Vincent

RNA-Seq MAplot DESeq • 5.6k views

ADD COMMENT • link updated 24 months ago by Ram 44k • written 9.4 years ago by VHahaut ★ 1.2k

1

Entering edit mode

I think the reason why you have such an asymmetry in your MA plot is that you have a rather unbalanced design. you are comparing many heterogeneous tumor samples against a few very similar control samples. I guess there could also be a difference in depth between the samples, is it true?

Basically what I think it's happening is that for some reason it's much easier to call downregulation rather than upregulation in your comparison. is it right that the downregulated genes in your MA plot are more highly expressed in the tumor samples compared to the control, or is it the opposite? if the former is true, I guess that's because there are a lot of genes not expressed in the control, but expressed in some of the tumor samples at different levels. that's why you get some kind of lines on the left part of the plot.

ADD REPLY • link updated 24 months ago by Ram 44k • written 9.4 years ago by Martombo ★ 3.1k

0

Entering edit mode

In terms of sizeFactors they are ranging from 0.4094 to 2.5580 with a average of 1.14. So I don't think that the depth is really an issue.

For downregulation vs upregulation, indeed my controls have a lot of genes that are expressed at low level (1-10 reads) compared with the tumors (+10-1000). You can even see it by eye when scrolling in the normalized count table.

What do you think I could do to obtain a better shape of MA plot? Is it something common when analyzing tumors with DESeq?

ADD REPLY • link updated 24 months ago by Ram 44k • written 9.4 years ago by VHahaut ★ 1.2k

2

Entering edit mode

what you could do to improve the analysis is to remove genes whose mean expression is very low. I can see from the MA plot that many are lower than 10. another possibility would be to compare single tumor samples to the controls, or to add a tumor variable to the design. that would reduce the dispersion that you get, which is quite high. a final suggestion would then be to downsample the bam files of your deepest samples, in order to reduce that difference in depth.

ADD REPLY • link 9.4 years ago by Martombo ★ 3.1k

0

Entering edit mode

Thanks for you suggestions. I will try to modify the design as suggested.

Concerning the sizefactors of the controls there are all around 1.2 so not really different from the majority of the tumors.

ADD REPLY • link updated 24 months ago by Ram 44k • written 9.4 years ago by VHahaut ★ 1.2k

0

Entering edit mode

A difference between 0.4 and 2.5 in size factors is quite high. It means that there is a 6 fold difference in sequencing depth between two samples. Ao am I right in assuming that the control samples have a lower depth, compared to the tumor samples? Is this something correctly reflected by the size factors?

ADD REPLY • link updated 24 months ago by Ram 44k • written 9.4 years ago by Martombo ★ 3.1k