Hi All,
I have two groups WT and KO.
As per the Jackstraw plot, ‘Significant’ PCs will show a strong enrichment of features with low p-values (solid curve above the dashed line).
How to interpret the JackStraw plot. How come even the PCs with p-value =1 is above the dashed line.
PC 5 has pvalue "1". Do I need to consider the PCs which has only pvalue <0.05 (PC : 1, 2, 3, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14 and 15) for the downstream analyses?
As per the Elbow plot, looks like at PC 34 the standard deviation is touching the ground and staying constant.
So how many PCs should I consider for the downstream analyses like (find neighbors, find clusters and UMAP)?
cond_integrated <- FindNeighbors(object = cond_integrated, dims = ?)
cond_integrated <- FindClusters(object = cond_integrated)
cond_integrated <- RunUMAP(cond_integrated, reduction = "pca", dims = ?)
As I change the number of dimensions each time, I am getting different UMAP clustering.
merged_cond <- merge(x = WT_seurat_obj, y = KO_seurat_obj, add.cell.id = c("WT","KO"))
# filtered the merged_con based on mito, etc
# split seurat object by condition from filtered_cond_seurat
for (i in 1:length(split_cond)) {
split_cond[[i]] <- NormalizeData
split_cond[[i]] <- CellCycleScoring
split_cond[[i]] <- SCTransform
Obtained integ_features from SelectIntegrationFeatures using split_cond seurat object
Obtained anchor features using PrepSCTIntegration
Obtained integ_anchors using FindIntegrationAnchors and SCT normalization method
Obtained cond_integrated seurat object using IntegrateData
cond_integrated <- RunPCA(object = cond_integrated)
DimHeatmap(cond_integrated, dims = 1:15, cells = 500, balanced = TRUE)
cond_integrated <- JackStraw(cond_integrated, num.replicate = 100, dims=50)
cond_integrated <- ScoreJackStraw(cond_integrated, dims = 1:50)
JackStrawPlot(cond_integrated, dims = 1:50)
ElbowPlot(object = cond_integrated, ndims = 50)
Want to revive this thread. I have a situation, when a lower number of PCs seems to give me more "biologically relevant" results, does it justify using a lower number of PCs?
I have following setup: several time points of cell differentiation protocol, but all represent different libraries (I know that it is far from ideal setup, but on the one hand it was made like this due to complicated protocol on wet side and on the other it should not prevent me from analysing each individual time point first and then try to make a between-point connection based on obtained biological prior-knowledge)
I'm performing UMAP dimreduction on a subset of my data, to see the overall structure. I've noticed that a low number of PCs (5) provide better time point-to-time point clusterization than a higher number of PCs (15). That is probably due to the batch effect, getting amplified with higher numbers of PCs. Would in that case be meaningful to use a low number of PCs? And maybe additionally perform clusterization with a higher number of PCs in each individual timepoint later on?
Advices will be appreciated
Best, Eugene
Elbow plot 5 PCs 15 PCs
Please do not add new questions as an answer to existing threads. If you feel your question is unique then create a new post for it.
Not sure what you are looking at to come up with this assessment, but the eyeball test says that clustering is much better with 15 PCs. Not only are clusters better separated globally, but red and blue are better separated locally, as are cyan and magenta groups.
The point here is that I have some prior knowledge of what these cells are and as far as these populations are on the way of differentiation trajectories it is safe to assume that consecutive days have to be somehow closer to one another than more distinct time points. And that is exactly what I see with 5 PCs. On the other hand with 15 all timepoints just scattered across the umaps components.
And my point is that they are not supposed to be arranged in any kind of trajectory that mimics their differentiation pattern. They are supposed to be well separated, which they are with 15 PCs. You are expecting too much from dimensionality reduction if you think that it is going to recapitulate the differentiation pattern.
There aren't 6 expected clusters with 5 PCs - there are 4 at most. If you didn't know their colors ahead of time, there is no way you'd be able to come up with a correct number of clusters. On the other hand, 15 PCs is much more informative regarding real clusters, though I wouldn't necessarily guess 6 either if all dots were uniformly colored.