Entering edit mode
6.3 years ago
mk
▴
300
Given:
- a 25x25 matrix of integers
- an initial 25x3 embedding, generated by PCA
- perplexity 2
- random seed set to 1
- target embedding dimension 3
Run exact t-SNE using MATLAB tsne('algorithm', 'exact') and R Rtsne(theta = 0)
Here is the data "A":
0 0 0 0 0 97 0 0 0 0 0 0 0 93 67 0 0 0 24 63 0 81 69 0 63
0 35 18 12 0 36 0 89 0 15 23 69 0 54 56 36 0 0 0 90 0 0 37 0 12
0 31 17 17 64 80 0 0 0 0 0 0 23 0 0 0 0 69 83 78 94 0 0 93 40
0 0 0 0 0 0 0 0 74 76 0 0 70 31 12 60 92 0 99 16 53 19 0 3 0
0 0 7 85 84 0 0 0 0 0 0 0 0 0 0 0 0 0 64 0 0 0 0 4 33
35 0 77 0 52 0 0 0 0 0 64 70 0 5 0 0 48 93 0 0 92 0 0 2 0
0 0 0 0 0 25 58 0 0 0 46 0 0 88 0 79 0 60 0 23 0 0 81 33 62
5 91 65 0 0 0 0 0 38 72 0 0 75 0 0 0 0 21 48 0 0 32 0 0 0
14 0 40 0 35 0 0 81 94 51 21 55 43 0 0 30 0 77 56 0 0 0 85 0 4
9 0 0 0 0 0 0 0 8 0 86 36 98 0 0 64 0 87 0 0 32 0 88 7 23
0 96 100 0 0 55 0 0 0 0 73 0 0 0 0 0 68 51 0 0 81 0 92 0 0
59 0 93 0 75 12 0 22 0 0 0 0 13 67 0 0 0 67 0 0 71 82 0 0 0
0 0 79 16 35 0 0 0 99 84 89 0 0 26 0 99 0 8 65 81 77 97 0 13 0
0 0 7 97 0 0 0 63 51 29 0 0 0 0 39 38 44 0 0 0 23 0 18 79 76
0 50 0 9 31 0 0 0 57 0 0 61 0 0 0 0 0 95 0 82 35 0 38 85 0
12 0 0 0 0 0 64 0 38 80 0 43 0 26 0 0 0 0 0 0 73 17 39 7 93
25 0 29 0 0 0 31 0 0 73 0 0 0 0 0 8 0 98 0 0 66 0 0 0 61
89 0 29 33 65 72 0 0 18 60 0 0 0 0 63 0 36 0 0 0 0 0 0 0 61
0 0 0 0 22 91 0 0 0 0 49 0 0 0 0 54 7 0 0 0 0 0 0 50 0
0 0 82 51 3 0 0 74 0 0 0 100 57 0 83 0 0 0 0 0 0 0 94 89 0
0 0 65 0 23 33 0 13 5 90 0 0 0 0 0 0 0 81 0 10 0 0 0 5 5
0 0 0 0 56 0 0 0 30 0 0 98 0 78 0 63 0 0 12 42 11 0 0 0 0
0 0 0 72 41 0 0 0 0 53 0 0 19 1 0 0 63 0 0 0 0 11 0 15 0
0 0 4 0 0 31 14 0 0 0 0 85 0 100 0 0 0 70 16 30 98 0 31 0 0
0 0 0 38 0 71 0 85 0 0 53 0 87 0 0 51 59 0 0 0 0 0 0 0 24
Here is the initial embedding "Y":
0.254 -0.440 -0.402
0.131 0.095 0.282
0.331 -0.166 0.242
-0.661 -0.449 -0.110
0.256 -0.522 -0.204
-0.126 0.079 -0.333
0.233 0.314 -0.400
-0.653 0.153 0.194
0.099 -0.149 0.602
0.192 -0.492 0.206
0.178 0.286 0.478
-0.092 0.541 0.012
-0.312 -0.013 0.587
0.252 0.505 -0.313
-0.603 0.079 -0.297
-0.059 0.255 0.493
-0.211 -0.412 0.274
0.569 0.253 0.011
0.158 -0.291 0.471
0.146 0.386 0.058
0.684 0.052 0.038
0.374 -0.120 0.110
-0.217 0.691 0.078
-0.367 0.142 -0.042
-0.155 -0.025 -0.597
Now generate a t-SNE embedding using R's Rtsne(). Notice that "theta = 0" gives the exact algorithm rather than Barnes-Hut:
M = Rtsne(A, perplexity = 1, Y_init = Y, k = 3, max_iter = 1000, dims = 3, pca = FALSE)
18.3306291 -6.2506041 -62.3268525
15.9386942 -3.1369803 -60.5240895
-55.0534150 -176.3234448 -26.9564002
2.8059395 -26.5137681 60.3847045
36.8003669 116.2724964 -44.7099789
-41.8265624 82.2428220 2.4331879
25.7039137 -31.0979402 47.7070508
6.7846280 -24.5332260 59.7495021
17.6887619 -26.3297157 53.9600569
23.9006623 -30.0315023 49.1130745
-43.6656649 83.6832572 5.8279631
-40.6721409 81.3381834 0.2965082
-0.7597657 -28.2349362 60.9845080
32.4991289 120.2894942 -47.4589986
-57.5001125 -176.7817773 -26.9090314
14.4104082 -15.9172413 61.2646084
12.6062537 -19.6219765 59.9587861
39.8651878 119.5524683 -49.4049012
-52.8444181 -65.8360785 8.1726297
12.0636781 -2.5291677 -62.3328198
11.2964251 -21.9339896 59.2367094
17.6280749 -0.2864139 -56.4086955
36.4835826 118.1396018 -46.6151879
18.4594930 1.1163553 -54.3799843
-50.9437484 -67.2759159 8.9376503
Now we generate an embedding using MATLAB:
M = tsne(A,'Algorithm','exact','NumPCAComponents',0, 'Perplexity', 1, 'InitialY', Y, 'NumDimensions', 3)
1.0e+03 *
0.1524 0.3834 -0.1735
0.1637 0.3731 -0.1781
-0.8293 0.3517 0.1845
0.3783 -0.0525 0.3495
-0.0068 -0.2998 -0.6433
-0.3656 -0.2723 0.1051
0.4675 -0.0788 0.3194
0.3858 -0.0658 0.3431
0.4267 -0.0795 0.3288
0.4579 -0.0790 0.3216
-0.3665 -0.2741 0.0900
-0.3651 -0.2712 0.1151
0.3713 -0.0407 0.3552
-0.0123 -0.2780 -0.6524
-0.8293 0.3517 0.1942
0.3862 -0.1062 0.3291
0.3909 -0.0915 0.3330
0.0137 -0.2877 -0.6476
-1.0500 -0.2379 0.0832
0.1678 0.3622 -0.1670
0.3933 -0.0817 0.3359
0.1718 0.3737 -0.1953
-0.0029 -0.2909 -0.6468
0.1761 0.3740 -0.2044
-1.0499 -0.2378 0.0930
Clearly these embeddings are not equivalent. Given an initial embedding, t-SNE should be repeatable. What am I overlooking?
t-SNE (t-Distributed Stochastic Neighbor Embedding) starts with random initialization that is why each time when you run the algorithm, you get different results (not exactly same, slightly different). In order to reproduce the results in R, you can use
set.seed(x)
where x is numeric value. I think it is really hard to get the same results between different programming softwares.