A couple of years ago Lior Pachter's group trashed non-linear dimensionality reductions (UMAP/tSNE). He called them "glorified Rorschach tests" and that you should never make inferences and draw conclusions based on them. They went so far as to develop similar method that was able to transform the single-cell data into arbitrary shapes of your choice.
Monocle3 is one of the most used tools for single cell trajectory analysis. As far as I understand from their methods, it calculates the trajectory based on the UMAP reduction. Both Monocle and velocyto are mentioned in the Chari paper, but they put the focus on velocity. In contrast, other methods like slingshot calculate the trajectory first, and then project it to the UMAP-space (or whatever dimRed you want).
What is the present state of this controversy? What are the current best practices for these analysis?
How can you measure/detect if your sc data has too much "hidden complexity" and basing the trajectory analysis on a dimRed will not give you adequate results?
I work with t-SNE on a daily basis, and also with UMAP but less frequently. My interest is not scRNAseq analysis but rather metagenomic binning, but dimensionality reduction methods don't care about the source of data. After having done this hundreds of times, I can state with certainty that clusters of dots correspond to real biological entities, and I am sure the same is true for sc-type analyses. There are numerous examples of experimentally confirmed cells or organisms that were first postulated in t-SNE/UMAP 2D plots. Non-linear dimensionality methods embed the points in ways that give intuitive visualizations and preserve local distances, but there is no guarantee that any two random points will have their distance faithfully represented in a t-SNE/UMAP 2D plot. In practice this means that cluster distances in the embedded space will most likely be preserved between groups that are relatively close to each other, and that is typically good enough for most applications.
The problem arises when one tries to understand the dynamics in sc-type datasets. As far as I know, there is no formal proof that the way t-SNE/UMAP spread the data points around has anything to do with biology. I think people believe those trajectories because they often make sense, and in many instances they might be correct. However, it is very likely that in some instances the order and connectivity of clusters have noting to do with biology, so concerns raised by Pachter and others are legitimate. Generally speaking, too many people use methods for which they do not understand pros and cons. The lure of dimensionality methods is especially dangerous because they produce such visually appealing representations that some people automatically assume they must be correct.
I generally avoid these Twitter-borne discussions since often enough people stirr a lot of noise for the sake of being heard, and then others jump in to fight about it -- all of which does not really help the scientific interaction. My personal take, like with any method, is to simply give it a try and see if it suggests potentially interesting results. Most of the time these dimensionality reductions are mere visualizations. For me personally it helps a lot to simply "look" at a 2D representation of my data when checking markers etc. The overall shape does not really matter, nor does the distance of points. Still, we have one project in which the spatial aspects of the UMAP and the cell density in certain areas suggested some interesting phenomenon that we followed up and validated in the lab, so here it gave valuable insights. Just try and see, like with any method.
Yes, the thing is that I already tried both Monocle3 and Slingshot, and both are giving me completely different results (and there isn't much literature about the issue at hand to tell right from wrong). Where slingshot finds 3 trajectories (with some weirdness), monocle insists on placing a cluster of cells (that appear as a terminal leaf in only 1 slingshot path) in the middle for its trajectory. And I suspect it only does so because where that cluster happens to fall in the UMAP.
I'm in the process of trying 2-3 other trajectory methods to see which of the flavors of weird seems more robust.
Yeah, like I said, there really is no “best practices” method so it’s hard to say whether something is right or wrong. So my sort of nonanswer to your question of Monocle3 vs Slingshot is I don’t really think someone can answer which one (if either) are giving you the right result.
Just do some exploratory work using however many / whatever methods you want, but don’t draw conclusions unless you are confident.
As the blog I linked to says (by Irizarry), these methods are powerful tools for exploratory data analysis (EDA), but not really for drawing conclusions. I’m a visual person so it’s definitely super useful for me to have these different ways of visualizing data for EDA.
My take: The single cell field has lots of problems / shortcomings (especially for methods that go beyond simple exploratory work), but not a lot of easy solutions. A simple “click of a button” method to do a “best practices” analysis doesn’t really exist.
Thanks! It looks very interesting. Unfortunately, for the analysis I have presently at hand, I didn't separate the data into pre-/post-splicing. We are a bit in an internal deadline and I won't be able to re-run the whole thing starting from scratch. But I'll give it a try on my next scRNA project.
Thanks for your point of view.
Yes, the thing is that I already tried both Monocle3 and Slingshot, and both are giving me completely different results (and there isn't much literature about the issue at hand to tell right from wrong). Where slingshot finds 3 trajectories (with some weirdness), monocle insists on placing a cluster of cells (that appear as a terminal leaf in only 1 slingshot path) in the middle for its trajectory. And I suspect it only does so because where that cluster happens to fall in the UMAP.
I'm in the process of trying 2-3 other trajectory methods to see which of the flavors of weird seems more robust.
Yeah, like I said, there really is no “best practices” method so it’s hard to say whether something is right or wrong. So my sort of nonanswer to your question of Monocle3 vs Slingshot is I don’t really think someone can answer which one (if either) are giving you the right result.
Just do some exploratory work using however many / whatever methods you want, but don’t draw conclusions unless you are confident.
As the blog I linked to says (by Irizarry), these methods are powerful tools for exploratory data analysis (EDA), but not really for drawing conclusions. I’m a visual person so it’s definitely super useful for me to have these different ways of visualizing data for EDA.