Question

Capped expression values in single cell RNAseq (Tabula Sapiens/scanpy/CellXGene)

0

Entering edit mode

13 months ago

matt.a.bennett25890 ▴ 30

Hi all,

If anyone familiar with using Tabula Sapiens or scanpy could address a question I'd be grateful. I'm looking into using Tabula sapiens (10x genomics data across several organs) to check out some cell-specific markers but finding that some of them appear to have their expression values capped. Looks like a small proportion of cells overall have expression values of 10.00 which leads to some funny looking distributions e.g. the below violin plot of normalised expression values for ANKRD1 - a gene which has these values capped at 10.00 in Tabula sapiens:

image: violin plot

Hard to tell from the manuscript but seems likely the data was normalised with scanpy, which I haven't used before myself. Would this be the source of the capped data? Not seeing any info anywhere on why this data looks like this. It seems likely to me to lead to some quite skewed diff. expression results and logFC values...

scanpy cellxgene scRNA-seq 10x normalisation • 542 views

ADD COMMENT • link 13 months ago by matt.a.bennett25890 ▴ 30

score 0 · Answer 1 · 2024-03-20

For anyone falling down a similar rabbit hole in the future: my working hypothesis is that this was done to facilitate easier visualisation on the UMAP. To prevent extreme expression outliers from obscuring the color scale for the vast majority of remaining values.

But probably means the values should be re-computed if doing any type of differential expression analysis on the normalised counts for Tabula Sapiens data