Question

Bad volcano plot

0

Entering edit mode

3.6 years ago

firestar ★ 1.7k

I have run differential gene expression on a 10X single-cell dataset in Seurat. See code below.

 m <- FindMarkers(obj_dge,ident.1=test,ident.2=ref,group.by="cell_type_condition_cluster",
                     test.use="MAST",only.pos=FALSE,min.pct=0.25,
                     logfc.threshold=0,max.cells.per.ident=200,
                     min.cells.group=10,random.seed=100,
                     assay="RNA",slot="data",
                     latent.vars=c("nCount_RNA","percent_mito","S.Score","G2M.Score"))

I get this volcano plot which looks strange.

enter image description here

Here is a histogram of the pvalues for all genes.

enter image description here

Total genes: 7195

DE up genes: 2359
DE down genes: 2218
Total DE genes: 4577

Does anyone know what might be wrong? And how to fix it?

diferential-gene-expression rna-seq sc-rnaseq • 4.1k views

ADD COMMENT • link updated 3.6 years ago by m.sadman.sakib ▴ 120 • written 3.6 years ago by firestar ★ 1.7k

2

Entering edit mode

Most volcano plots deriving from scRNA-seq data will not look like the traditional volcanos, obviously due to the much lower fold changes. It's a very different data-type.

ADD REPLY • link 3.6 years ago by Kevin Blighe 89k

0

Entering edit mode

This dataset has two cell types. And there are several clusters within each cell type. And I am mostly doing between cluster comparisons within a cell type, for now. The example above is a comparison between clusters within one of the cell types. All comparisons for the other cell type looks like regular volcano plots. See attached image. So I am not sure if this a single-cell specific issue. I only see this issue with one of the cell types. I am running exactly the same code as well. I suspect some statistical issue with null or I might not be accounting for a latent variable. Or even biological? I am not really sure how to go about investigating this.

enter image description here

ADD REPLY • link 3.6 years ago by firestar ★ 1.7k

2

Entering edit mode

Granted I don't have much experience with scRNAseq... To get so many genes with such small p-values, I assume you are treating each cell as a replicate so that you are comparing two groups each made of thousands of replicates. This gives you very high power to detect genes that deviate even minimally from the null hypothesis of no difference. That's why you have so many red genes in your plot. Perhaps you could get a clearer picture if you use heatmap colours to avoid the problem of overplotting (see this post of mine for an example How to visualise differential expression analysis).

ADD REPLY • link 3.6 years ago by dariober 15k

0

Entering edit mode

You should try the pseudobulk approach for differential expression analysis in scRNAseq datasets for comparing two conditions. Cause the default seurat method will always give you super inflated p-values coming from comparing thousands of cells between two conditions, whereas, you need to collapse the reads for each cell-types into individual biological replicates and then use DESeq2 or EdgeR for proper differential expression analysis for each of those cell-types. I must say, this field is still growing, so keep looking out for recent papers where they compared two conditions in scRNAseq data.

ADD REPLY • link 3.6 years ago by m.sadman.sakib ▴ 120