Question

RNA-seq differentially expressed genes (DEGs) with low fold change values

0

Entering edit mode

5 months ago

singlecell_bio • 0

Hello everyone,

I have a question concerning RNA-seq analysis, particularly about the differentially expressed genes. To provide some background, we used three biological replicates for each group where cell lines exposed to arsenic or not for a specific time period. We dealt with batch effects, as two replicates from each group were sequenced later. To mitigate these batch effects, we employed ComBat_seq, which significantly reduced the batch effect. We also implemented a count threshold to filter out lowly expressed genes across the groups, setting a minimum of 10 counts in each group. However, the resulting volcano plots looked quite unusual, with most of the differentially expressed genes (DEGs) falling within a log2 fold change of 1. I'm unsure how to interpret this. The FASTQC reports were satisfactory, and all samples showed more than 85% of mapped reads. Could this pattern be related to the use of ComBat_seq, where transcript count levels might have been altered by the algorithm to correct for batch effects, or might there be another explanation? Thank you!

PCA Plot after batch correction:

image: PCA Plot after batch correction

Volcano plot showing DEGs with no lfcshrinkage:

image: Volcano plot showing DEGs with no lfcshrinkage

Volcano plot showing DEGs with lfcshrinkage:

image: Volcano plot showing DEGs with lfcshrinkage

ComBat-seq RNA-seq DESeq2 DEGs • 1.1k views

ADD COMMENT • link updated 5 months ago by swbarnes2 14k • written 5 months ago by singlecell_bio • 0

0

Entering edit mode

There is a good number of DEGs in the top plot beyond abs(log2FC) > 1. The shape of that plot generally looks better save for the compression in the middle because of your X-axis scale (see below). There are savvier RNA-seq users on this forum than yours truly and maybe they will tell you that removing low counts is justified, but I never do that. Never had to deal with the batch effect you described.

I would suggest that you change the X-axis in these plots. By stretching it in the [-7, 7] range, you are unnecessarily making it difficult to see the business part of your plot. The range should be at most [-5, 5] for the top plot if you wanted to keep it symmetric, but [-5, 3] would also work.

ADD REPLY • link 5 months ago by Mensur Dlakic ★ 28k

0

Entering edit mode

Thank you for the suggestion. Limiting the x-axis did improve the visuals. However, there are only a handful of DEGs that lie beyond abs(log2FC) > 1 with most of them nearing 1. I'm uncertain if this reflects the inherent nature of the data.

ADD REPLY • link 5 months ago by singlecell_bio • 0

score 2 · Answer 1 · 2024-05-30

2

Entering edit mode

5 months ago

ATpoint 85k

Looks like a typical cell line RNA-seq with a strong biological effect and ver low variability between replicates. What you can do is to test against a threshold, e.g. log2(1.5) to focus on genes with large effect sizes, see DESeq2 vignette (lfc argument in results or lfcShrink) or glmTreat in edgeR or treat in limma.

After all, the default test just asks "is logFC different than 0", and this is relatively easy to reject with large biological effects and low variability.

ADD COMMENT • link 5 months ago by ATpoint 85k

0

Entering edit mode

Thank you for this insight! This is a great point and I totally agree. If this is a common trend with cell line based RNA-seq experiments, this may be another explanation as to why we are seeing such results. Like you suggest, I will try to set the threshold at log2(1.5) to focus on genes with large effect sizes with lfcShrink.

ADD REPLY • link 5 months ago by singlecell_bio • 0

score 2 · Answer 2 · 2024-05-30

2

Entering edit mode

5 months ago

swbarnes2 14k

The first PC is 86% of the variance, but the values for that PC are quite small. That suggests to me that the differences in the genes driving PC1 are modest, so that might explain why few have large fold changes.

Though that looks like a whole lot of genes changing. More than I'd expect from a simple chemical treatment. So maybe something did get messed up with combat-seq. I'd try without combat-seq, and include batch in the design instead.

ADD COMMENT • link 5 months ago by swbarnes2 14k

0

Entering edit mode

Thank you for the suggestion! I did initially try it without combat-seq by simply including batch in the design, but the results weren't compelling. Like you suggest, the minimal variance in the first component could explain the observed.

ADD REPLY • link 5 months ago by singlecell_bio • 0

0

Entering edit mode

For context, when I have different related tissues in the same RNASeq experiment, the first PC usually represents about 90% of the variance, though the actual values of the first PC are much higher.

You must have thousands of changed genes there, it's not clear to me that you should expect that from a chemical exposure. It might be that combat-seq is not working right, and the pure DESeq way is returning correct, if disappointing results.

ADD REPLY • link 5 months ago by swbarnes2 14k

0

Entering edit mode

Thank you for the context! Do you recommend including the different tissue types and timepoints in the same analysis to yield better results? I will try to run another iteration without combat-seq.

ADD REPLY • link 5 months ago by singlecell_bio • 0

0

Entering edit mode

You know that merely putting a library on the instrument at a different time doesn't cause a batch effect, right? If the libraries were prepped at the same time, there's no batch effect to correct for.

ADD REPLY • link 5 months ago by swbarnes2 14k

0

Entering edit mode

Libraries were not prepped at the same time. The batches were 4 months apart.

ADD REPLY • link 5 months ago by singlecell_bio • 0

0

Entering edit mode

I'm just saying that to my eyes, your graphs look strange. Maybe your treatment is of a much different nature than anything I've done, and I'm just interpreting wrong. But I've never seen a PCA where the magnitude of the first two PCs was so small. And you have what looks like thousands of genes changing, but the magnitudes of those changes is very slight, and the p-values aren't all that great either. And it's not like you have so many samples that you have the power to confidently call small differences. Your volcano plot is symmetrical, which likely rules out any kind of biased normalization doing anything, which is good. It's almost like your data was logged somehow, giving you counts of small magnitudes, so your PC's don't add up to much, and your fold changes are smaller too.

ADD REPLY • link 5 months ago by swbarnes2 14k