Question

Understanding dimred in scater for scRNASeq

0

Entering edit mode

6 months ago

sp • 0

Hello,

I am new to scRNASeq analysis. All of this is in R, and all functions were run in default.

I am trying to use fastMNN only till data integration for a comparative study with other tools. I am following a very standard workflow that I found through the example codes in documentation:

Read in the matrix as a sce
Normalised it with logNormCounts()
Performed feature selection using modelGeneVar()
Selected top n hvgs with getTopHVGs()
Performed PCA and UMAP on the sce, used runPCA() and runUMAP(). This info I believe is stored in "PCA" and "UMAP" of the sce.
Visualised the UMAP using plotReducedDim(), which I believe to be the same as the likes of plotUMAP(), except dimred is a requirement (which I set to "UMAP").
Performed data integration using fastMNN() after subsetting using the chosen hvgs.
Again ran PCA and UMAP on sce_integrated, except now dimred HAS to equal "corrected'. I don't understand this.
Plotted the UMAP for sce_integrated for comparison with before integration, and again used plotReducedDim(). I was not sure if dimred should equal "UMAP" or "corrected", since I believe the embeddings are stored in "corrected", so shouldn't "corrected" be used for visualisation as well? However when I plot dimred="UMAP", the UMAP is different from the UMAP earlier, which means the embeddings get overwritten?

Summary of doubts:

I don't understand why PCA and UMAP need to be run twice, before and after integration.
Why is dimred="corrected" needed for runPCA after data integration? (earlier sce <- runPCA(sce, ncomponents = 50) worked).
For plotting UMAPs should dimred="corrected" be used after data integration?
Do UMAP embeddings get overwritten in sce if UMAP is run again after data integration?

Thanks for all your help~ Sorry about the long post, I wanted to provide as much context as possible.

SingleCellExperiment fastmnn scater scRNASeq • 501 views

ADD COMMENT • link updated 6 months ago by jared.andrews07 ★ 18k • written 6 months ago by sp • 0

score 2 · Accepted Answer · 2024-06-26

A few clarifications.

runPCA() and runUMAP() don't have to be run before fastMNN() - it just makes sense to do so that you have a pre-integration set of dimensionality reductions to compare to.
After running fastMNN(), you run UMAP again to see the effects of the integration. And yes, you want to do this on the corrected embeddings, which are already PCA components (so you don't have to run PCA again). Read the details of ?fastMNN for more info. If you didn't provide "corrected" to dimred, you'd just be running it on the original PCA (or it might throw an error if the original PCA isn't retained after fastMNN, I can't remember if it is).
When running runUMAP() again, you should specify the name parameter, e.g. name="corrected_UMAP" so that you can easily compare between the original and integrated UMAP. This can then be specified during plotting.
If you just run runUMAP multiple times, yes, it will overwrite the embeddings in "UMAP", but as specified in point 3, you can name the embeddings whatever you want and specify them as such during plotting. You can carry along as many embeddings as you want, which is useful for comparing sets of parameters, e.g. different n_neighbors or min_dist for multiple runs of runUMAP.