Dear all,
I have a multi-fasta file which consists in 1,000 coding sequences. I would like to compute the Ka/Ks between each pair of sequences.
For the moment, I am performing a multiple sequence alignment at the protein level using MUSCLE v5 followed by trimming and back-translating the protein alignment into a coding sequence alignment using trimAl. I can then use seqinr (R package) to compute all possible pairwise Ka and Ks values.
I now would like to perform the same kind of analysis but instead of performing a multiple sequence alignment, I would like to perform pairwise sequence alignments. I could use a for or a while loop in bash, using muscle and seqinr at each iterations but with 1,000 sequences in the file, this would mean 499,500 pairwise alignments followed by and Ka/Ks computations ...
Furthermore, I would actually want to repeat that for many different genes, so for many different multi-fasta files each containing 1,000 sequences and sometime even more. The largest multi-fasta file I have contains 9,000 sequences (which means 40,495,500 pairwise comparisons)
Do anyone have an idea on how I could achieve that, or of another method to perform such pairwise alignment + ka/ks calculation very rapidly ?
Thanks for any help !
All the best,
Maxime Policarpo