Should I scale all genes in single cell Seurat?
1
0
Entering edit mode
13 months ago
synat.keam ▴ 100

Apologise for many posts this weeks. I am wondering in seurat, should I scale all genes for downstream analysis or just some features is okay? I am a bit unclear when it comes to scaling.... I have attached the code here.

Also, how do I know which genes are noise/confounding genes, could you refer me to R script somewhere to look for noise genes and how to filter them out in R? Really appreciate your help so far!

data.filt<- NormalizeData(data.filt)


#cell cycle 

data.filt<- CellCycleScoring(data.filt, g2m.features = cc.genes$s.genes, s.features = cc.genes$g2m.genes, set.ident = TRUE)

VlnPlot(data.filt, features = c("S.Score", "G2M.Score"), group.by = "orig.ident",
    ncol = 4, pt.size = 0.1)

data.filt<- FindVariableFeatures(data.filt, selection.method = "vst", verbose = FALSE, nfeatures = 2000)

# Scale data 
all.genes<- rownames(data.filt)


# Option 1

data.filt<- ScaleData(data.filt, 
                      vars.to.regress = c(
                        "nCount_RNA","nFeature_RNA", "percent_mito", 
                                          "percent_ribo", "S.Score", "G2M.Score"
                        ), 
                      verbose= FALSE)

# Option 2
data.filt<- ScaleData(data.filt, 
                      vars.to.regress = c(
                        "nCount_RNA","nFeature_RNA", "percent_mito", 
                                          "percent_ribo", "S.Score", "G2M.Score"
                        ), 
                       features = rownames(all.genes),
                      verbose= FALSE)

# PCA

data.filt<- RunPCA(data.filt, verbose = FALSE)
singlecell • 2.0k views
ADD COMMENT
3
Entering edit mode
13 months ago

Hi,

Scaling all features might be useful to plot genes that are not among the 2k HVG in a heatmap. Otherwise than that, I never encountered a specific analysis where I would need all the genes scaled, but, of course, there might be such an analysis. Thus, I would say it depends on which downstream analyses are you interested in and which type of input data they require.

Regarding your second question, there might be more sophisticated analyses one could do, but usually plotting the expression of such genes/features in a PCA can roughly give you an idea if a gene or cell feature, e.g., percentage of mitochondrial genes, represents noise or a confounding variable. Correlating such suspected noise/confounding genes/features with PCs (Principal Components) might help to quantify such effects. Of course, the challenge relies on distinguish between confounding and biological meaningful genes/features. A gene might be correlated with PC1 because drives differentiation and, thus, it is biological meaningful. Though UMI counts might be correlated with PC2 and might mean it is noise. The decision always needs to be supported with your expectations considering the experimental design, biological conditions and cell types as well questions that you're trying to answer.

I hope this helps.

Best regards,

António

ADD COMMENT
0
Entering edit mode

Thanks, António for your kind and detailed responses. You helped clear my doubt about scaling!

Kind Regards,

Synat

ADD REPLY

Login before adding your answer.

Traffic: 926 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6