Entering edit mode
3.4 years ago
firestar
★
1.6k
I have run differential gene expression on a 10X single-cell dataset in Seurat. See code below.
m <- FindMarkers(obj_dge,ident.1=test,ident.2=ref,group.by="cell_type_condition_cluster",
test.use="MAST",only.pos=FALSE,min.pct=0.25,
logfc.threshold=0,max.cells.per.ident=200,
min.cells.group=10,random.seed=100,
assay="RNA",slot="data",
latent.vars=c("nCount_RNA","percent_mito","S.Score","G2M.Score"))
I get this volcano plot which looks strange.
Here is a histogram of the pvalues for all genes.
Total genes: 7195
DE up genes: 2359
DE down genes: 2218
Total DE genes: 4577
Does anyone know what might be wrong? And how to fix it?
Most volcano plots deriving from scRNA-seq data will not look like the traditional volcanos, obviously due to the much lower fold changes. It's a very different data-type.
This dataset has two cell types. And there are several clusters within each cell type. And I am mostly doing between cluster comparisons within a cell type, for now. The example above is a comparison between clusters within one of the cell types. All comparisons for the other cell type looks like regular volcano plots. See attached image. So I am not sure if this a single-cell specific issue. I only see this issue with one of the cell types. I am running exactly the same code as well. I suspect some statistical issue with null or I might not be accounting for a latent variable. Or even biological? I am not really sure how to go about investigating this.
Granted I don't have much experience with scRNAseq... To get so many genes with such small p-values, I assume you are treating each cell as a replicate so that you are comparing two groups each made of thousands of replicates. This gives you very high power to detect genes that deviate even minimally from the null hypothesis of no difference. That's why you have so many red genes in your plot. Perhaps you could get a clearer picture if you use heatmap colours to avoid the problem of overplotting (see this post of mine for an example How to visualise differential expression analysis).
You should try the pseudobulk approach for differential expression analysis in scRNAseq datasets for comparing two conditions. Cause the default seurat method will always give you super inflated p-values coming from comparing thousands of cells between two conditions, whereas, you need to collapse the reads for each cell-types into individual biological replicates and then use DESeq2 or EdgeR for proper differential expression analysis for each of those cell-types. I must say, this field is still growing, so keep looking out for recent papers where they compared two conditions in scRNAseq data.