Hello everyone,
I'm currently working with a dataset comprising 496 cancer samples and my aim is to perform class discovery to identify potential clusters within the data. My plan involves constructing a graph using the igraph and subsequently employing Louvain clustering to uncover communities within the dataset.
To start, I've applied a nearest neighbors search using the RANN package to create an adjacency matrix. The outcome of this step is a list with two elements: nn.idx, which provides the nearest neighbor indices, and nn.dists, offering the corresponding Euclidean distances. Both matrices have dimensions of 496 x 10, representing the near neighbor information for each sample.
I'm seeking guidance on how to proceed from here to generate a square adjacency matrix suitable for building a graph object. I truly appreciate any assistance or insights you can provide.
Here's the script I've used for the nearest neighbors search:
neighbors <- RANN::nn2(df.dat.2)
This is a glimpse of how my data looks:
df.dat.2[1:6,1:5]
Cytotoxic_lymphocytes NK_cells IMMUNE_CD8MACRO_GALON IMMUNE_TREG_PASTILLE IMMUNE_TH1_GALON
DFR18201125_S2 1.0847794 -0.7873059 2.0384016 2.3400656 1.6245264
CMR18160811_S14 1.4107888 0.7359309 2.3173862 -0.6108772 1.1848030
AKNR18200612_S1 1.4906686 1.4865404 1.8352200 0.4802375 1.5588536
X211_S39 -0.2367320 -0.4840597 -1.4358730 0.5417923 -1.1066911
X213_S41 -0.6760709 -0.5416064 -2.3613827 0.1748922 -1.6711970
X214_S42 -1.0435179 -0.4920662 -0.8881929 -0.6243585 -0.8318914
Thank you for your insights and suggestions in advance.
Thanks bk11, but I want to create a graph adjacency based on nn2 method in RANN package not using correlation or Euclidean distances.
You need to use
nn.idx
element and define k (The maximum number of nearest neighbours to compute). And perform something like this-This is for your reference- https://github.com/TomKellyGenetics/leiden
Thanks so much bk11! Are there any specific criteria or guidelines that I should consider when selecting the k value for my gene expression data?
I also got this error when running this part of the script:
Do you know how to resolve the issue?
I am working with R and this is my session info:
Sorry it doesn't let me to save the Session info here --- because of invalid characters!!
The maximum number of nearest neighbours to compute. The default value is set to the smaller of the number of columnns in data
Did you install
leiden
? If not do this-install.packages("leiden")
ordevtools::install_github("TomKellyGenetics/leiden")
and additionally you may have to dopy_install("python-igraph")
&py_install("leidenalg")
. Please make sure that you have installedreticulate