Hi,
I am new at performing RNAseq single cell analysis, so perhaps my question is naive. But here it is: I used the diffxpy package to find differentially expressed genes between two conditions in a given cluster of cells using wald test. To extract the genes with significant log fold change, I apply the following condition on pandas results dataframe:
condition_m = (test_megak.summary()["qval"] <= 0.05) & (abs(test_megak.summary()["log2fc"]) >= 1)
gene_indices_m = test_megak.summary().index[condition_m]
gene_indices_list_m = gene_indices_m.tolist()
I end up with a list of around 3000 indices out of the 23000 gene tested. But when I run the volcano plot of diffxpy on the test results, it only shows a few point meeting my criterias for differentially expressed (i.e corrected pval <= 0.05 and abs(log2fc) >=1):
test_megak.plot_volcano(corrected_pval=True,log10_p_threshold=-20,log2_fc_threshold=5, min_fc=2, alpha=0.05, size=20)
I actually know that there should be only a handful of genes differentially expressed, certainly not 3000. So the volcano plot corresponds more to what I am expecting.
Can anybody tell me what I am doing wrong? How can I extract the genes names corresponding to the colored points I see on the volcano plot using the diffxpy api?
DESeq2 is definitely not the gold standard for scRNA-seq.
It's totally fine to perform DE analysis using
DESeq2
also with scRNA-seq data, just keeping in mind to change some specific parameters, as also stated in the official documentation.