I have a few bacterial whole genomes. I would like to alignment them to each other (multiple sequence alignment) and interactively explore the alignment/bases along with annotations. I have fasta genomes along with respective gff annotations. I am interested in knowing what are currently the best tools/workflows/formats.
My experience so far. Common MSA tools like Clustal, MUSCLE, MAFFT, T-Coffee, Kalign etc are too slow and impractical to use for 5Mb sequences. Whole genome scale tools include Mauve, Mummer and Mugsy. Mauve is neat, but it's not possible to do down to nucleotide level to explore nucleotide changes in genes or to see which proteins have changed. Mummer has no interactive capability. Mugsy seems to produce a MAF output which cannot be read by any standard MAF viewer (Aliview, ugene, MView, MViewer, Wasabi etc). It only works with GMaj. GMaj is the worst genome browser ever. I have no idea what it is visualising and makes no sense. I am not sure if something like IGV would work for MSF. I see IGV as something to explore stuff mapped/aligned to a common reference.
This is too complex. To make it simple, I will start with selecting gene region (annotated sequences) of a reference genome (Lets say E. coli in your case). This will lead to find all the orthologs in different bacterial species: use MCScanX (http://chibba.pgml.uga.edu/mcscan2/) for this purpose. MCScanX can be used to find orthologs between two or more species. It is very simple tool to run which uses blast output as input. You can also find homologs, col linearity, Ka/Ks value among species with this tool
There is no such tool that can manage this. Genome scale MSAs at the exact nucleotide level just isn’t possible at the moment.
The best you could achieve is probably multiple pair wise alignment, and even then, it may only be pseudo-alignment, and of dubious quality at the exactl nucleotide level.