Can we use UMAP clustering on bulk data?
1
0
Entering edit mode
2.2 years ago
Info.shi ▴ 30

Hi everyone,

I have transcriptomic bulk data. I have 10k genes and 5 replicates. I want to see the clustering pattern between replicates using UMAP to represent each replicate as a point but it shows Error: umap: number of neighbors must be smaller than the number of items after this

iris.umap <- umap(matrix)
Replicates  MSTRG.714.1   MSTRG.9848.1  MSTRG.8579.1  MSTRG.2154.1  MSTRG.434.1    ................. 
Rep_1       12.1871378    4.648702047   0.125640596   2.512811917   5.905108005    .................
Rep_2       8.69549926    5.864406477   0.101110457   1.213325478   4.246639173    .................
Rep_3       10.3490802    4.704127361   0.188165094   0.376330189   4.327797173    .................
Rep_4       9.803265483   0.710381557   0.284152623   1.420763113   5.967205076    .................
Rep_5       24.94352535   1.950890251   0.139349304   0.975445125   2.508287465    .................

Kindly suggest me.

R UMAP • 2.5k views
ADD COMMENT
0
Entering edit mode

If you have a distance matrix of some sort, I recommend that you try affinity propagation..

ADD REPLY
2
Entering edit mode
2.2 years ago

I think you need:

umap(matrix, n_neighbors=n)

with n less than the number of samples (i.e. < 5).

I'm not 100% sure of what I'm going to say here: In principle, I don't think it is wrong to apply umap on a few samples. But I think PCA would be preferable since it gives distances between datapoints that are less distorted than with umap and with 5 samples there cannot be many clusters you can possibly identify anyway. I mean, if you have many samples and many clusters, umap is likely to separate those clusters better than PCA but the price to pay is that distances are not easy to interpret. Since with 5 samples you cannot have many clusters, better to stay with PCA.

ADD COMMENT
1
Entering edit mode

n less than the number of samples

Technically yes, but should be smaller than the number of samples. The number of neighbors is the size of the local neighborhood, so if equal to all samples, you are essentially assuming a single neighborhood. In the umap-learn tutorial, they use a range of 2 to a quarter of the total sample size as examples of extremely low and extremely high settings.

ADD REPLY

Login before adding your answer.

Traffic: 2200 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6