Why the most suitable UMAP PCs for my dataset is only Three?
1
0
Entering edit mode
2.9 years ago
631079064 • 0

At first we knew that the UMAP plot for germ cells (scRNA-seq) always appeared continuous.

I have a single cell transcriptome dataset of fish testis (nearly 95% germ cells) , and I should set the PCs for UMAP to only 3 to ensure the plot appeared normally continuous. When the PCs come to 4 or higher, the topological structure will be twisted and cracked, so it can not demonstrate the developing stages of germ cells at all.

I'm sorry that I cannot upload any picture. One could refer to my question on stackoverflow: https://stackoverflow.com/questions/70494798/why-the-most-suitable-umap-pcs-for-my-scrna-seq-dataset-is-only-three

Thank you for your help!!!

germ cell UMAP • 1.5k views
ADD COMMENT
1
Entering edit mode

Why did you do this experiment? How many detectable cell types do you think exist in your testis sample? Why do you think continuity in a UMAP is a requirement of developing cell stages? Are you asking why is your data showing you more than you want to see? Rather than impose dimensionality on your data, as Mensur Dlakic suggests, determine the dimensionality from the data itself. That is, if you're using the data for discovery. On the other hand, if you've exhausted the data analysis and now want to tell a story involving a subset of PCs, that would be different. Why did you do this experiment?

ADD REPLY
1
Entering edit mode

I'm sorry that I cannot upload any picture.

You can upload pictures using the image icon (one that is to right of 101010 button).

ADD REPLY
1
Entering edit mode

For pseudotime analysis part, you can use monocle.

ADD REPLY
1
Entering edit mode
2.9 years ago
Mensur Dlakic ★ 28k

Don't see why you think that 3 PCs are most suitable. All those plots look fine to me. I don't know much about the scientific problem you are studying, but in terms of visualization I don't think there is any rule that requires UMAP plots to be continuous. UMAP plots in general have nothing to do with biology - they are just low-dimensional representations of high-dimensional data. We try to extrapolate them to match what biology is telling us, which is probably why you think that 3 PCs are more suitable. Someone else may argue that 4 PCs are better because cluster 12 is more clearly separated from the rest.

Depending on the variance captured by 3 PCs versus a larger number, it is quite possible that 4 or 10 PCs are more appropriate because they include more variance. I suggest you try whatever number of PCs is required to capture at least 90-95% variance. Depending on the number of data dimensions, UMAP may be able to create this plot in a reasonable time from raw data.

Lastly, you may want to know that it is frowned upon when the same post is created at multiple websites (for reasons that are not always clear to me).

ADD COMMENT

Login before adding your answer.

Traffic: 2375 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6