I have 2000 genome assemblies(each fasta might have 5-6 contigs) of a particular serovar of Salmonella in 2000 separate fasta files(essentially 2000 fasta files). I need to align them to each other so that I can generate a tree. What would be the best tool to align these sequences and the aligned output file needs to be a single fasta file. Thank you.
Why do you need a tree? A tree with 2,000 tips isn't really good for anything and it would take pretty much forever to build..
I suggest all-vs-all Mash (choose k and s wisely) and subsequently cluster the resulting distance matrix with affinity propagation. Even with a laptop, this takes but a few minutes. If you use the R AP implementation, you can also output a heatmap with a dendrogram..
Thank you. This answer led me to discover mashtree. I have access to a cluster and it was really fast. Thank you.