Question

Mauve Similarity plots - similar or not?

1

Entering edit mode

9.0 years ago

rrcutler ▴ 120

Hello all. I have been using Mauve to visualize genome draft assemblies compared against a reference genome. I have been reading the website on how to interpret the similarity plots that are displayed in the viewer. Specifically they say "The height of the similarity profile is calculated to be inversely proportional to the average alignment column entropy over a region of the alignment." - I understand this as a the greater the height, the more similar two sequences are (correct me if I'm wrong). However, when choosing the similarity ranges, I am having trouble understanding how to interpret this.

So it seems that when comparing similar sequences the similarity plots (purple) should be fairly high and full, like in this: mauve plot

When selecting to display both the similarity plot and range my plots look like this, Similarity range lines are dark purple:

mauve plot

Further more, I visualized two identical sequences in mauve and got this:

mauve plot

So how do I interpret what the similarity plot ranges mean? What I think is that the higher a similarity range line is, the less range of differences there are between the sequences. The spiky purple shading also indicates ranges, but it seems to be more exaggerated than the dark purple line.

Also in the mauve output, what are the .backbone and .bbcols files?

Many Thanks

Mauve genome assembly • 3.7k views

ADD COMMENT • link updated 9.0 years ago by aaron.darling ▴ 40 • written 9.0 years ago by rrcutler ▴ 120

score 4 · Answer 1 · 2016-07-27

Starting with the 2.4 releases, Mauve is no longer shading the entire area under the similarity curve, but instead draws a bolded line for the median similarity value over a region. The 'ranges', which are creating the fattened jagged areas in lighter purple in the 2nd plot of your post are showing the range of similarity values in that region. This adds context to the median line, so that it's possible visually to see regions where there may be small pieces missing or changed that are below the resolution of the current zoom level. As a result the plot can become visually intense to the point I thought of dubbing the 2.4 release the 'visual assault' release. I am certain that Mauve is violating some fundamental laws and regulations of data visualization.

In your 3rd plot the two sequences are identical so the median similarity is 100%, and the range covers 100%-100%, so it all just appears as a single purple line.

The user guide contains a description of the backbone file, I won't recapitulate it here.

Hope that helps!