Entering edit mode
3.8 years ago
A_heath
▴
170
Hi all,
I am using Mugsy to perform multiple genome alignments and gmaj viewer to display them and its working great.
I have an alignment between two close Clostridium strains and I would like to know the percentage of identity between the two genomes.
Do you have any ideas on how to have this info?
Thank you so much in advance for your help!
Audrey
I haven't used mugsy in a long time - this might be something it can output directly?
otherwise, if you have the alignment file itself, its fairly straightforward to write a small script to calculate this. See for example: https://github.com/jrjhealey/bioinfo-tools/blob/master/StringComparisons.py#L108-L124
Thank you Joe for your reply.
Knowing that I have a .maf result file, do you think that I will be able to apply those scripts?
Not directly - they aren't designed for that. It was just an illustration of the calculation of the %ID being fairly simple. MAF is a slightly more difficult format to deal with, since (IIRC) its not one continuous sequence. Are you just after a single value to explain the sequence distance?
Using gmaj viewer to have a global overview of the alignement, the two genomes that I'm comparing seem relatively close to each other. I would just like to have a %ID to make it easier for interpretation of these results. On the mugsy documentation, I haven't seen any information on this..
Generally speaking a %ID of a full genome alignment is not going to be incredibly informative - you might want to consider looking in to
mash
distances which are a bit more robust for these types of comparisons (and don't care about discontinuous sequences).