I want to generate a distance matrix with many nucleotide sequences. I need to perform clustering(NMDS analysis) with that data. Can anyone suggest which r-package should be used to create a distance matrix so that I can go for clustering analysis.
I want to generate a distance matrix with many nucleotide sequences. I need to perform clustering(NMDS analysis) with that data. Can anyone suggest which r-package should be used to create a distance matrix so that I can go for clustering analysis.
One easy-to-find solution to distance matrix is this: https://rdrr.io/bioc/DECIPHER/man/DistanceMatrix.html
However, whether this is applicable or even advised depends. First, your data must be aligned. Then you say you need to perform clustering, but what for? For generating phylogenetic trees, for example, clustering a distance matrix is not state-of-the-art. It should only be used if there are so many sequences that not even ML methods like Fasttree or Iqtree are applicable.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Did you google first?
I am sorry sir I am an extreme beginner to the r- language. I have seen DistanceMatrix in google but the issue is I do not want to calculate the hamming distance between sequences. I want to calculate p-distance and again my sequences are not of the same length so got confused about what packages to use. and I do not intend to generate a phylogenetic tree I just need to generate an NMDS plot(Non-metric multidimensional scaling) to see which sequences are clustering together.
If your sequences are not aligned you need to align them first, and that should be better done outside of R. How many sequences are there really?
You can possibly do that with MEGA, see here: https://www.megasoftware.net/mega1_manual/Distance.html#:~:text=p%2Ddistance,p%20%3D%20nd%2Fn.
It really depends on the size of your dataset.
I have around 30000 seq per file. thank you for the help
Aligning them will be a challenge, especially if you don't have a high-mem server available. Try MAFFT, e.g. here: this server: https://mafft.cbrc.jp/alignment/server/large.html , or in Galaxy or on a local server if you have. But these options may take a long time or fail.
ok sir thank you