Hi all,
I have assemblies of two microbial genomes from PacBio data (49 and 69 contigs each) and I'm trying to figure out how similar they are. The final genome size should be about 7.5 Mb.
I remembered the mummer tools from a project many years ago, so I ran nucmer and got the SNPs output using show-snps. This output turned out to be troublesome because it's not a VCF file. I was able to convert the SNPs using a script I found here.
My questions:
- The mummer tools have been around for quite a while -- would you recommend something else, or are they still current?
- I heard bwa-mem can also now align whole microbial genomes -- anyone have any experience with this?
Thanks for any advice!
Indeed, MUMmer has been around for a while, but in addition to being a good tool from the start, it has been updated constantly, and MUMmer 4 is on the works.
Although bwa mem can align long(ish) sequences, there is a cap at about 10Mb, and I think query and subject need to be pretty similar, otherwise performance degrades. Here is a (somewhat old, so may be outdated) blog post from BWA author. minimap2 is probably a better option.
By the way, as said on the first blog post I linked, LAST is good for very big sequences, and it is really fast.