Hello I have found a region of a genome assembly (vertebrate genome) that, when resequencing data is aligned to it, contains a large number of discordant reads and larger-than-expected insert sizes.
It seems like this is an error in the genome assembly, where contigs are duplicated, or misordered, or something like this.
Therefore I would like to try to "rescaffold" this region and correct contig ordering, remove duplicate sequence, incorporate missing sequences, etc.
Are there any guidelines for what programs to use? Should I create an entire de-novo assembly using the resequencing and compare it to the original, or can I, for example, run specific steps of the abyss (or other program) pipeline to fix the assembly?
There's nothing in the ABySS toolset for identifying/cutting misassemblies, unfortunately. I noticed that there is a tool called NxRepair that detects/cuts misassemblies using mate pair reads, but I have not tried it.
It may be an assembly error, it may be existing biological variation.
99% sure it is assembly error since all the resequencing data has the same odd pattern. But even if it was biological variation, that'd be interesting ...and maybe you'd want to resolve the variant genomic assembly in that region!
You can probably break the contigs where there are break-points or assembly errors into fragments. Re-align the fragmented one with medusa or other scaffolders
Thanks, Rohit. Do you have any other recommendations for scaffolders? I will give medusa a try anyways!
RACA