Entering edit mode
9.7 years ago
YMTIO
•
0
Hi everyone,
I want to count the shared (the same) nucleotides between any two sequences in one alignment. I want to get a matrix including numbers of nucleotides shared by any two sequences in a batch way. However, I can not program. Anyone can help me to solve it? Thanks!
Thanks! but I want to get a matrix including the shared number between all pairwise in a batch way.
Alignment tools like mafft, muscle etc. does this for you if you are not comfortable with writing a script to compare all possible k-mers/aligned bases between all pairwise sequences to construct a distance matrix.
Follow the steps given in this link: http://mafft.cbrc.jp/alignment/software/treeout.html
(keep in mind that your input should be the raw fasta sequences)
Other wise you could use (slight modification required) this python code to do all possible pairwise comparisons in your alignment file to get the distance matrix:
HI Felix
Thanks for your reply. However, i do not want to get the distance matrix. I just want to get the shared number matrix (numbers of nucleotides shared by any two sequences) similar to identity matrix.