Deseq2 positive results for genes highly variable between replicates
0
0
Entering edit mode
5.4 years ago
guillaume.rbt ★ 1.0k

Hi all,

I'm using DESeq2 to find differentially expressed genes between two conditions from RNAseq data, with lots of replicates (46 in condition "1", 20 in condition "2").

I get results with significative adjusted p-values, but for most of them the gene expression values are highly variable between replicates.

For example for the gene with the lowest adjusted p-value, I've got all samples from both conditions with low normalized counts (around 10), and just one sample in one condition with >200000 normalized counts, which drives the differential expression toward this condition.

See log2(normalized counts + 1) boxplot below ( the adjusted p-value is 8.05e-12, and the log2FC is -5.87 between condition "1" and "2" for this gene)

boxplot

Here is the code I used :

dds <- DESeqDataSetFromTximport(tx_import_data, coldata, ~condition)
keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]
dds$condition <- relevel(dds$condition, ref = "R")
dds <- DESeq(dds)
res05 <- results(dds, alpha=0.05)

I'm wondering if this is "normal" that DESeq2 keeps those kinds of results and I that should filter it if I find it irrelevant, of if I made some mistake during the process and that DEseq2 should only keep genes without such expression dispersion between replicates?

Thank for your help

deseq2 RNAseq • 2.2k views
ADD COMMENT
1
Entering edit mode

With only words but no plots illustrating your question it is difficult to make any statements. Please provide e.g. some boxplots of normalized counts or tables.

ADD REPLY
0
Entering edit mode

Ok I've just put a link with a boxplot illustrating my example.

ADD REPLY
0
Entering edit mode

log2 scale please ;-) and see How to add images to a Biostars post. You have to paste the link with the full suffix like https...foo.png to the image box.

ADD REPLY
0
Entering edit mode

done ;) sorry I never uploaded a plot before

ADD REPLY
0
Entering edit mode

I would check if these outliers samples also show outlier-like behaviour in a PCA maybe indicating a batch effect and if so, think about removing them.

ADD REPLY
0
Entering edit mode

Ok thanks, I've checked that and unfortunately they don't seem to be different from the other ones on the PCA.

ADD REPLY
1
Entering edit mode

In my experience, this kind of result typically stems from the presence of a very high variability in samples of the same group (compared to between groups). You may want to correct for possible co-variates in your data (see svaseq) or simply filter out results with high dispersion.

ADD REPLY

Login before adding your answer.

Traffic: 2637 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6