I have two pacbio assemblies for the same plant species, and I need to determine if there are regions represented in one assembly that are missing from the other. I have tried using progressiveMauve, and while I suspect that the information I'm looking for is somewhere in the output I'm having a hard time finding it.
Does anyone have a solution to this problem?
as an update, here are stats on one of the assemblies -- the other is similar to this.
number of contigs: 18355
mean contig size: 27903.8
median contig size: 15781
total size: 512174223
Pretty much every contig is big enough to include repetitive elements of some sort, so blastn output is not of much value.
It would be useful to add the size range of the contigs you have. Some of the solutions below may not be usable if you have large contigs. Using a program like LASTZ may be your best bet.
simplest approach is using tools like blastal/ blastN or blat for pairwise alignment, considering one assembly (assembly1) as query and another as database/subject (assembly2). Any contigs of assembly2 not showing hit as subject for assembly1 will be specific to assembly2.
the complicating feature of that approach is repetitive elements missed by DUST and the relevant repetitive elements databases. At a first look it appears I'll need to build a repetitive element db for this species before I can proceed with something like that.