Question

Large-scale bulk RNAseq Experiment with abnormal MA-plots for apoptotic conditions

0

Entering edit mode

2.2 years ago

Nico • 0

Hi everyone,

We performed a large bulk RNAseq experiment on an iPSC - line. Cells were treated and harvested in 96 well plates and automatically processed following the SmartSeq2 protocol by a core unit. We have roughly 60 conditions randomly assigned throughout the plate and multiple control wells / plate in a blocked experimental design (Each sample only appears once at the same position / plate). We have four replicates/sample.

Pre-processing: bcl2fastq -> STAR -> HTSeq. FASTQC shows high quality reads for all samples.

For the analysis I only include coding genes, remove genes with rowsum < 500 (with a total of ~ 360 samples) which leads to a matrix with ~14k genes. I use DESeq2 for differential expression analysis. I include surrogate variables from SVA as covariates to the model (SVA, n_sv = 4), to get rid of an initially present batch effect.

After running DESeq2 I investigate the MA-plots and make the observation that conditions with increased cell death (cells were counted in 96 well plates before harvested fro SmartSeq) show an abnormal MA plot:

MA plot of an apoptotic condition

Other samples with unchanged viability show a balanced, expected MA plot

MA plot of condition without apoptosis

I took the following steps to investigate the situation:

I extracted the signature from the MA-plots by "manual gating" and could confirm the the genes that appear as the upregulated population are mostly the same across different apoptotic conditions. Pathway annotation shows the "Apoptosis" hallmark pathway from msigdb significantly enriched in this gene population.
I thought about possible contamination, but the apoptotic conditions match what we expect and are randomly distributed over the 96well plates, which makes a contamination extremely unlikely.
I looked at the sample to sample correlation plots. interestingly I do see slight deviations of the "manually extracted" group of consistently upreglated genes, but this phenomenon does not correlate with viability (or the presence of the abnormal MA plot). Batch correction nearly completely gets rid of this observation.
I ran DESeq2 without surrogate variables which also leads to a similar MA-plot pattern'

Some additional thoughts:

The cell line we are using can be 'activated' and exist in different states. Could a different state of activity in a cell induce such a MA-plot pattern?
Considering the fact that this is an iPSC-line, could potential de-differentiation account for such an MA-plot pattern?

My question:

Is there a valid biological explanation for this observation?

MA-plot DESeq2 Smartseq2 • 1.7k views

ADD COMMENT • link 2.2 years ago by Nico • 0

0

Entering edit mode

What happens when you remove the SVA covariates?

ADD REPLY • link 2.2 years ago by Asaf 10k

0

Entering edit mode

I did rund DESeq2 without surrogate variables, the MA-plot abnormalities were still present.

ADD REPLY • link 2.2 years ago by Nico • 0

0

Entering edit mode

Exvessive logFC with low baseMean is usually due to poor or no prefiltering. I personally find rowsum/rowmean-based filtering suboptimal. Do it group-aware, see for example https://support.bioconductor.org/p/9152700/#9152701

ADD REPLY • link 2.2 years ago by ATpoint 89k

0

Entering edit mode

Thanks for your input! I will definetly look into that. But as you can see from the MA plot, not only low baseMean genes are affected here. Hence I don't think I'll be able to get rid of this just by adjusting the pre-filter.

ADD REPLY • link 2.2 years ago by Nico • 0

score 0 · Answer 1 · 2023-06-27

As ATpoint already said, above, first make sure all the standard QC steps (including, but not limited to, gene-level QC (e.g. prefiltering) and sample-level QC (e.g. sample distance matrix visualization & PCA plot generation to identify outlying samples, in particular samples that lie far from other samples in the same group(s)) are solid. All of that is covered in the vignette as I expect you already know.

Assuming this has been done, you've already taken what I think is likely the next most important step, i.e., identify commonalities between the markers that cluster in the log2FC high group you seem to be referring to.

For this MA plot, I might isolate all logFCs greater than 2.5, but less than 4, then pull the gene symbols for these. Next, I'd enter that list of gene symbols into a tool like DAVID, to see if they have any biologic coherence. If these genes are known to be co-expressed in physiologic states (here, apoptosis), and if the co-expression relationships mimic what can be observed in other apoptotic cells of similar type, it's more likely a real finding...

The problem is, in this case, apoptosis is a tricky state in this regard. Lots of machinery that can affect RNA-seq results is activated, including mechanisms to destroy RNA as well as changes to cell permeability, for instance. As a result, I'd want to know exactly why these genes are LFC up, and what exactly they have to do with apoptosis. As an example, consider: are they LFCup because they are NOT being degraded in non-apoptotic samples, or are they LFCup because they are upregulated in apoptotic samples. To do this, could consider generating countPlots of genes you understand very well, then sanity checking that the direction of effect makes sense and is consistent with the known biology displayed by "normal" intact cells of that type beginning apoptosis...

For these reasons, I do think you are right to be cautious. While I've seen many examples where far more genes are LFCup than LFCdown, or the reverse, I have never seen a case quite like this before. Finally, you can also expect to get a question like this from a reviewer if you include the MAplot. So, irrespective I'd make it a point to control for the batches you indicate and to be able to defend the results.