Hi.
I got multiple assemblies that were created by using different programs. These assemblies originated from genomes of different individuals from the same species, so they are expected to show high similarity.
I started by running the command nucmer --minalign=100 -p <name> <query-sequence> <reference-sequence>
, and then the command mummerplot --png output.delta
, which yielded a graph with scattered dots, instead of the expected diagonal line.
I wanted to ask if there's something in the commands that caused these results. Maybe the command needs certain parameters. I didn't know which ones to use.
edit
I used samuel.a.odonell's advise and got an improved plot, yet I'm still having issues. The dots are rather scattered. I thought about using matches of 1000 bp or more, and I've been looking for a way to rearrange the contigs so that the plot will show a diagonal line. Also, I'm looking for a way to modify the axes tags, so that instead of having a smear of innumerable tags, there will be a numeric scale.
I've been looking for solutions for these issues for the last two days but couldn't find any. I'm sure they're there somewhere but I guess it's a needle in the haystack situation.
Here's one of the plots I got: mummerplot output
Aside from the scattered dots, there is a clear vertical line for each contig/scaffold It would suggest you have a array of repeats in these contigs
Do you see this for every contig you have assembled? Also, you could also filter the alignment (delta) to try remove the small repetitive alignments (using the -g tag)
Yes. All the outputs look this way.
Great. Thanks. I'll try.
I tried your suggestions and got improved results. could you take a look at the new plot I got? I added it to the question. Thanks.
This new plot is with the whole newly assembled genome against a reference? They both seem to have a lot of contigs no? One way to tidy up all the contigs would be to try reference baed scaffolding however if your reference is not very contiguous it may not work very well, but it should allow you to order your contigs as in the reference to try get a diagonal line (although there may be better ways to do it) Could you show just a few of the largest contigs again with this filtering? Also you could have a look at the nucmer alignment with dnadiff (provide it your delta file) to see how much of the reference is covered with alignments, look at the percentage similarity etc
Yes, this is the whole genome assembly aligned against the reference. This is the first time I'm doing this kind of process, so I don't know if it's a lot of contigs or not. Tomorrow I'll be able to align a small number of cotigs and post it here. I tried using dnadiff before and it gave a messy plot. Do you mean to use it after the filter? I didn't try that yet.