Running AUCell on integrated scRNAseq data
1
0
Entering edit mode
20 months ago
Alex Gibbs ▴ 90

Hi everyone,

Posting here as I can't quite seem to find a definite answer.

I would like to assess cell-level pathway activity using AUCell on my integrated dataset. The vignette says to use the 'data' slot, which for me is a 3000 x 10024 (genes x cells) expression matrix. However, this expression matrix consists of scaled data (using 3000 highly variable genes).

I am wondering wether using this data as the input for AUCell will limit my results? Should I be using all genes for every cell instead? If so, how to I obtain a normalised expression matrix containing every gene for every cell from my integrated Seurat object without having to rerun the analysis and subsequently losing the clustering?

Thanks for your help and advice in advance.

Alex

scRNA-seq AUCell SCENIC Seurat • 2.7k views
ADD COMMENT
0
Entering edit mode

Hi there!

I am having the same question. How did it goes? I was doing my analysis in the integration and a subset data, it seems that AUCell give a different result when I look at it on my UMAP plot under the same scale. I really appreciate any feddback about this analysis.

Thank you

ADD REPLY
0
Entering edit mode

please start a new post and provide code and figures to better explain the issue

ADD REPLY
1
Entering edit mode
20 months ago
jv ★ 1.8k

the Seurat data slot doesn't hold "scaled" data, that's what the scaled.data slot is for. See https://github.com/satijalab/seurat/wiki/Assay#slots for more details. My guess is that you are referring "normalized" data.

The above point aside, it sounds like you are specifically looking at the "integrated" assay which only includes the highly variable genes used for integration. However AUCell doesn't care about the integration, and as noted in the tutorial

Since the scoring method is ranking-based, AUCell is independent of the gene expression units and the normalization procedure.

Given the way AUCell uses aucMax for scoring gene set activation I think having more genes in your input will be helpful, not to mention that gene set enrichment typically benefits from information from both highly variable genes as well as less variable genes. But I am interested to hear what other have to say on the matter.

All this to say, you can reasonably use the raw counts (i.e. assay "RNA" and slot "counts") from your integrated object for AUCell input.

ADD COMMENT
0
Entering edit mode

Thanks for your reply, jv!

Yes, apologies, I wrote this post in abit of a frantic rush. I was meant to say that I am looking at the integrated data object as I have already peformed clustering and UMAP(ing) and wanted to overlay the GSEA results on top of the UMAP. I did notice that sentence in the tutorial but was questioning the methodology - if I feed it the scaled data (3000HVGs for each cell), then it can only perform its independent procedure on those genes.

I was thinking the same thing, but a lot of people seem to from upon using raw data for certain analyses so thought it was a good idea to ask on here. Thanks for you input, I think I will try this and let you know how it goes!

ADD REPLY

Login before adding your answer.

Traffic: 1709 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6