Entering edit mode
3.1 years ago
ishakbishara91
•
0
I performed a Pearson correlation between the ssGSEA scores for all the 50 Hallmark pathways and the number of gene counts in my data. I noticed that most hallmark ssGSEA scores correlate with the number of gene counts.
Is there any biological reason why such correlation exists? or is it a pure technical artifact? I also noticed that a more stringent nCount filtering cut-off leads to less correlation between the nCount and pathway scores.
Note: A +/- 0.3 Pearson correlation coefficient cut-off splits red and blue points.
How low are your gene counts? Are you using normalized data for ssGSEA?
Average gene count/cell is about 2200. I set my low cut-off for the gene count at 500. I'm using ZinbWave normalized data for calculating the ssGSEA scores.
Are you using this for single-cell data? It's designed for bulk.
What are you referring to by "this"?
ssGSEA analysis
Yes, you're correct. Although, I found some studies that ran GSEA/ssGSEA on single-cell level. My logic was that ZinbWave would correct for the sparsity pre-ssGSEA but there's no studies that benchmarked this approach that I know of.
I haven't tried tools which are specifically designed for GSE on single cell level like PAGODA2 and VISION since the literature is still scarce.
Another approach is to perform ssGSEA on pseudo-bulk by summing counts on either by cluster or by sample/cell-type. But this is a last resort since it will decrease the resolution.
What do you recommend?
If you are worried about scarcity of literature, AddModuleScore from Seurat is probably used in most papers using Seurat.
Whether you want to do this on single-cell or pseudo-bulk level depends on the exact question you are trying to answer.