Question

Force directed graphs compared to other dimensionality reduction methods for scRNA-seq

7

Entering edit mode

5.3 years ago

MutationalMeltdown ▴ 200

For scRNA-seq visualisation of the transcriptional profiles of cells, people are usually doing PCA, followed by a non-linear dimensionality reduction technique like t-SNE or UMAP. However, some other methods have been suggested for visualisation including diffusion maps and force directed graphs (FDGs). Diffusion maps are meant to be good when doing cell lineage tracing, but a number of prominent studies (e.g. this one) also use FDGs. I'm trying to understand why FDGs might be suitable for showing differentiation cell lineages? Can someone give me a conceptual explanation focussing on comparing FDGs vs. diffusion maps vs. t-SNE and UMAP? Although the methods are rather different mathematically, they can all be applied to scRNA-seq data in a similar way through packages like Scanpy. Thanks

RNA-Seq scRNA-Seq single cell machine learning • 6.6k views

ADD COMMENT • link updated 5.2 years ago by Jean-Karim Heriche 27k • written 5.3 years ago by MutationalMeltdown ▴ 200

score 6 · Answer 1 · 2020-01-30

6

Entering edit mode

5.2 years ago

Jean-Karim Heriche 27k

What you refer to as FDG is not a dimensionality reduction method, it's a group of graph layout algorithms, i.e. algorithms to draw graphs that are based on considering edges as spring-like and applying forces to nodes along the edges, typically repulsive forces are applied to all nodes and attractive forces applied to adjacent nodes. Variants result from different choices of forces, e.g. based on electrical, magnetic or inertia properties. Typically this is used when the data is encoded in a similarity matrix that one can view as the adjacency matrix of a graph. In these algorithms, edge weights are taken as the strength of the relation between nodes so the more similar two nodes are, the closer they will be to each other in the drawing which can reveal some cluster structure.
Dimensionality reduction methods do what their name implies, i.e. they try to project the data into a lower dimensional space, the difference essentially lies with the objective function they try to minimize. PCA tries to maximize variance, multi-dimensional scaling tries to preserve point-to-point Euclidean distances, diffusion map preserves the diffusion distance (by simulating heat diffusion on the graph associated with the similarity matrix). For the objective functions behind t-SNE and UMAP, check this UMAP for t-SNE post. Dimensionality reduction aims at getting rid of noisy dimensions in the data so can be useful for revealing clusters. However, applying clustering in the new space should be done with caution as the chosen dimensionality reduction algorithm may not always preserve relevant structures (for example, one can imagine two clusters that could be separated along a variable with low variance, then PCA would probably miss them).

ADD COMMENT • link 5.2 years ago by Jean-Karim Heriche 27k

1

Entering edit mode

This preprint is illuminating https://arxiv.org/abs/2007.08902

ADD REPLY • link 4.8 years ago by MutationalMeltdown ▴ 200

0

Entering edit mode

Looks interesting. Will read it. Thanks.

ADD REPLY • link 4.8 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Thanks, the Scanpy docs describes Force-directed graph drawing as "An alternative to tSNE that often preserves the topology of the data better", which I interpreted as meaning that FDGs are used for dimensionality reduction plus viz, otherwise how is it an alternative to tSNE? Also, why are FDGs used for displaying cell lineage trajectories?

ADD REPLY • link 5.2 years ago by MutationalMeltdown ▴ 200

1

Entering edit mode

Force directed graph layouts are alternative to t-SNE as t-SNE is mostly useful for visualization. As to why a graph layout algorithm is better able to display cell lineage, it's because this information may be better represented in the graph than in the data used as input to t-SNE. The trick is to build a suitable similarity matrix to use as adjacency matrix of the graph or to find a suitable algorithm to extract/preserve this information from the data. Doing PCA before t-SNE is probably a good way of losing this information.

ADD REPLY • link 5.2 years ago by Jean-Karim Heriche 27k