Entering edit mode
9 weeks ago
frarodmar17
•
0
I am trying to analyze an integrated single-cell dataset and I got so many differentially expressed genes. Is it normal to get around 3.000 diferentially expressed genes in a specific contrast when the total dataset contains around 17.000 genes?
Your question is lacking information to be answered.
Species, experimental design, replicates, metadata to compare (goal of the study), differential expression method, contrast design, cutoff, pvalues, multiple hypothesis testing, adjusted pvalues... ?
I am comparing three experimental conditions related to a disease (2 types of the disease and one control condition), using FindMarkers function provided by Seurat. I am using specifically Wilcox.test, and I am applying a threshold of |log2FC| > 1 and adjusted p-value < 0.01. Regarding the experimental design, I have around 250.000 cells per condition.
How many samples do you have for each of your 3 conditions ? What cell type are your comparing between different conditions ?
Wilcoxon rank sum test is not the most suitable tools to find DEGs between conditions, have a look at EdgeR, DESeq2 or limma-voom.
As I am analysing single-cell data, my analysis is focused on cells, not samples. As I said before I have around 250k cells per condition. The number of cells per subtype varies. I thought that seurat wilcox test is the best option to analyse single-cell rna-seq data. EdgeR, DESeq2 and limma-voom are more focused on bulk RNA-seq data, no?
Your 2 comments contradict each others.
Are you analysing 1 cell type between different conditions or 2 cell types in the same condition ?
If the latter apply you can use Wilcoxon Rank Sum test, otherwise a pseudobulk approach gives much robust DEGs if you have sample replicates
Sorry, I had not understood you before. I am analysing 1 cell type between different conditions. Okay, thank you very much! I will try with pseudobulk approach.