How can I avoid artifacts in gene set/pathway scoring by UCell and similar algorithms?
0
0
Entering edit mode
22 months ago

Hey people,

I’m analyzing scRNA-seq data for mice from 6 different biological groups. I am using Seurat (“MetaFeatures”/“AddModuleScore”) and UCell/ssGSEA (via “escape”) to try and look for differences in pathway/gene set representation between these groups. While looking at the results of hundreds of pathways/gene sets, I’ve noticed that most of these results look very similar to one another. I am now quite certain that – in most cases – the (many) differences I see between the experimental groups, in terms of their score for specific certain pathways/gene sets, are an artifact.

I suspect that the problem stems from differences (between the samples) in terms of the average number of unique genes (“nFeature”) and/or in terms of absolute cell numbers. I’m attaching an image with some graphs that exemplify the issue (I’ve removed group/set names, because I’m not allowed to reveal them). The top row includes the factors I suspect may cause the problem, while the bottom row includes UCell scores of a few gene sets that exemplify the problem (I’ve gotten similar results when using Seurat’s “MetaFeatures”/“AddModuleScore” functions). Also, as you can see, two of the six groups are from one batch (“Batch 1”) and the other 4 groups are from a different batch (“Batch 2”). Each group had its own (separate) lane on the 10X Chromium platform.Example image

The data were normalized and integrated using Seurat before running the MetaFeatures/AddModuleScore/UCell/ssGSEA functions.

Any idea what I can do in order to remove these artifacts, so that I can get meaningful results?

Cheers, Omer

ssGSEA Seurat escape UCell • 992 views
ADD COMMENT
0
Entering edit mode
ADD REPLY

Login before adding your answer.

Traffic: 1328 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6