Hi all
I have two Seurat object for scRNA-Seq and extract the raw count from the object:
#crating two matrix for raw counts
ccl4_count <- as.matrix(ccl4_list@assays[["RNA"]]@counts)
ctrl_count <- as.matrix(ctrl_list@assays[["RNA"]]@counts)
Then I use the DESeq2 package for normalization and DEG analysis:
#creating one expression matrix
mat.ctrl_ccl4 <- cbind(ctrl_count,ccl4_count)
#grouping
gr <- factor(c(rep("ctrl", ncol(ctrl_count)) , rep("ccl4", ncol(ccl4_count))))
coldata <- data.frame(group=gr)
dds <- DESeqDataSetFromMatrix( mat.ctrl_ccl4 , coldata, design= ~group)
#the script bellow takes a long time :|
dds <- DESeq(dds)
normalizedCount <- log2(1+counts(dds, normalized=TRUE))
#finding DEG
dif <- data.frame(results(dds, c("group", "ccl4" , "ctrl")))
That is all my script. Eventually in the dif
table, I have about 2000 genes with 0 p-value and now I don't know how to deal with these genes. They make the volcano plot seem incorrect.
The other problem is that normalization of single-cell data by DESeq2 is really time-consuming because of the large number of cells.
Does anyone have any idea of how to efficiently perform DEG analysis for single-cell analysis? Thanks for any help.
The "weird" plot is based on pvalues that are near or identical to the machine limit of displaying numericals. You can get that with
Not much to do about that other than in any case using something more appropriate for scRNA-seq data. Either the Wilcox test or something like limma-trend based on the logCPMs. This here is a great read to get started:
https://bioconductor.org/books/release/OSCA/
Pseudobulk methods have been showing promise for DE in single-cell. Check this paper out