Hello. I am looking for a tool or a methodology to make whole genome alignments with 50+ genomes. I used to use Gegenees that basically construct a matrix (heatmap) of similarity based on all vs all BLAST results. Recently, Gegenees stopped working and I don't know the reason. It might be because I've updated my Java version. It prints out some errors related to eclipse (which I don't know what it is). Anyways, is there any other tool or method similar to Gegenees? I need to know the similarity percentage between the organisms I am working with in order to define cut-off values for pan-genome analysis. Thank you!
The only tools I know of that are capable of multiple whole genome alignments are LAST and Mauve, and even then, the alignments will be poor and take a long time. This is a largely unsolved problem in bioinformatics -the data is just too huge.
If you want to do all vs all pairwise alignments, you can use Wouter’s suggestion, or I believe
mummer
can do this fairly rapidly. Just bear in mind you’ll have at best,n choose k = 1225
alignments to do which is unlikely to be fast in any circumstances.You might be able to use
mash
distances as a surrogate, though I’m not sure what the relationship between mash distances and ANI is (if any).Another tool which can do that is minimap2.
Thank you for all the answers. Just to provide a feedback, searching on the web I ended up finding about GET_HOMOLOGUES which can be used to create pan genome matrices thus fitting what I needed.
Glad to hear you were able to move forward.
However (and for future reference), if GET_HOMOLOGUES has helped you doing so, I'm afraid your question was not really 'on topic' because 'whole genome alignments' is not something it will do. AFAIK it works on gene/protein level.