So I've decided to write a program to compute dimension reduction and potentially clustering specifically for single cell RNAseq. I got tired of running PCA/t-SNE in R, where I do most of my downstream analysis, since t-SNE specifically takes such a long time to run. Typically I'll run t-SNE with several different perplexity values to see which visualizations I like best; this is computationally intensive. I know that C++ and Python are both significantly faster than R; my question is, if I wanted to write this program in either of those languages and then call it from R, using either Rcpp
or reticulate
, which should I use if my main aim is speed? I'm confident in my ability to write the code in either language, but I'd rather not do it twice, and I'm not familiar with the speed / ease of integration of either of those two packages.
Yes, I'm aware of the various implementations of both t-SNE and UMAP and their benefits / drawbacks. My interest is in reducing the computation time of running these algorithms several times, since R is memory-greedy and slow. I plan on simply calling the C++ code provided on van der Maaten's GitHub to run t-SNE after performing PCA. Do you really think that running the dimension reduction step in C++ / Python wouldn't be significantly faster than running in R?
Pure python implementation of t-SNE is very slow, so I would not recommend that. There are python bindings that already use C++ code or
bh_tsne
binary, and I suspect the same is true for R implementation (never used it). I would be very surprised if R implementation of t-SNE is pure R, but I could be wrong. My point was that t-SNE is slow even when one is using fastest implementations possible.The Rtsne package that everyone uses is an
Rcpp
implementation of the Barnes-Hut fast t-SNE written by Laurens van der Maaten. The question was less about the algorithm itself and more about the speed of computation of C++ vs. R, since C++ is a lower level language.A quick search of Github with
t-SNE
retrieves 52 R results. I'd look there first before doing anything else.