Hello all,
I have a Unicycler assembly of a bacterial genome from PacBio and Illumina reads. It's a rather small but repetitive genome, and it's didn't assemble into one circualr chromosome, despite having 500x long read coverage.
I've tried few other options (i.e. long read-only assemblers) and they didn't produce a finished genome as well.
I've read that it's possible to finish the assembly by inspecting it with Bandage and by aligning reads to it. However it's not obvious how to do it and what with? Bandage offers BLAST functionality, but I don't think I can blast 500x of PacBio reads onto the graph. Would it make sense to get consensus set of well-corrected reads with Racon? And how does one identify a subset of long reads that potentially span the contig ends?
Thank you for any suggestions.
What's the quality and length of the long reads? If you have ILMN reads handy, I'd recommend using Ryan Wick's Filtlong to filter out the longest and highest quality reads possible. Maybe down to about ~100X coverage? You can use the ILMN reads as a reference for filtering the Nanopore reads. It also has a very handy script included w/ filtlong to quickly generate stats on the reads before/after filtering - https://github.com/rrwick/Filtlong/tree/master/scripts
As a matter fact I do have some Illumina! Thank you, very good suggestion.
wow I completely misread PB reads for nanopore, sorry. does filtlong work on pacbio reads?
yes, I've ran it with my data and it worked very well (I've ran it against Illumina reads).
How many contigs you had on your assembly? I played just a little with Bandage, it helped decide where to design PCR primers to link contigs and (possibly later) sequence and finish the assembly. I had only MiSeq data, PacBio is still a rarity around here. Of course, this approach is only useful if you have small gaps, and not too many.
I think you can try to map consensus reads with minimap2 and identify chimeric reads mapping to different contigs.
I have just a few contigs - the assembly is almost complete.