Capped expression values in single cell RNAseq (Tabula Sapiens/scanpy/CellXGene)
1
0
Entering edit mode
8 months ago

Hi all,

If anyone familiar with using Tabula Sapiens or scanpy could address a question I'd be grateful. I'm looking into using Tabula sapiens (10x genomics data across several organs) to check out some cell-specific markers but finding that some of them appear to have their expression values capped. Looks like a small proportion of cells overall have expression values of 10.00 which leads to some funny looking distributions e.g. the below violin plot of normalised expression values for ANKRD1 - a gene which has these values capped at 10.00 in Tabula sapiens:

image: violin plot

Hard to tell from the manuscript but seems likely the data was normalised with scanpy, which I haven't used before myself. Would this be the source of the capped data? Not seeing any info anywhere on why this data looks like this. It seems likely to me to lead to some quite skewed diff. expression results and logFC values...

scanpy cellxgene scRNA-seq 10x normalisation • 392 views
ADD COMMENT
0
Entering edit mode
8 months ago

For anyone falling down a similar rabbit hole in the future: my working hypothesis is that this was done to facilitate easier visualisation on the UMAP. To prevent extreme expression outliers from obscuring the color scale for the vast majority of remaining values.

But probably means the values should be re-computed if doing any type of differential expression analysis on the normalised counts for Tabula Sapiens data

ADD COMMENT

Login before adding your answer.

Traffic: 1537 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6