Question

Running AUCell on integrated scRNAseq data

0

Entering edit mode

2.7 years ago

Alex Gibbs ▴ 90

Hi everyone,

Posting here as I can't quite seem to find a definite answer.

I would like to assess cell-level pathway activity using AUCell on my integrated dataset. The vignette says to use the 'data' slot, which for me is a 3000 x 10024 (genes x cells) expression matrix. However, this expression matrix consists of scaled data (using 3000 highly variable genes).

I am wondering wether using this data as the input for AUCell will limit my results? Should I be using all genes for every cell instead? If so, how to I obtain a normalised expression matrix containing every gene for every cell from my integrated Seurat object without having to rerun the analysis and subsequently losing the clustering?

Thanks for your help and advice in advance.

Alex

scRNA-seq AUCell SCENIC Seurat • 4.3k views

ADD COMMENT • link updated 2.0 years ago by jv ★ 1.9k • written 2.7 years ago by Alex Gibbs ▴ 90

0

Entering edit mode

Hi there!

I am having the same question. How did it goes? I was doing my analysis in the integration and a subset data, it seems that AUCell give a different result when I look at it on my UMAP plot under the same scale. I really appreciate any feddback about this analysis.

Thank you

ADD REPLY • link 2.0 years ago by PBC ▴ 10

0

Entering edit mode

please start a new post and provide code and figures to better explain the issue

ADD REPLY • link 2.0 years ago by jv ★ 1.9k

score 1 · Answer 1 · 2023-03-13

the Seurat data slot doesn't hold "scaled" data, that's what the scaled.data slot is for. See https://github.com/satijalab/seurat/wiki/Assay#slots for more details. My guess is that you are referring "normalized" data.

The above point aside, it sounds like you are specifically looking at the "integrated" assay which only includes the highly variable genes used for integration. However AUCell doesn't care about the integration, and as noted in the tutorial

Since the scoring method is ranking-based, AUCell is independent of the gene expression units and the normalization procedure.

Given the way AUCell uses aucMax for scoring gene set activation I think having more genes in your input will be helpful, not to mention that gene set enrichment typically benefits from information from both highly variable genes as well as less variable genes. But I am interested to hear what other have to say on the matter.

All this to say, you can reasonably use the raw counts (i.e. assay "RNA" and slot "counts") from your integrated object for AUCell input.