Hi, I have nanopore reads and a very fragmented genome assembly (~500 contigs for 16-20 mb genome) but not the illumina reads. I have used canu and generated a de novo assembly (44 contigs) from the nanopore reads (~30x). Since I do not have illumina reads, I could not polish this de novo assembly. Therefore, many of the ORFs could not be annotated (due to base pair level errors). I was wondering if there is any way to use the contigs (assembled from illumina reads) and improve the assembly quality (rectify base pair level errors). I have also tried LINKS and SMIS and could improve the assembly from ~500 contigs to ~200 contigs but we need a better assembly for our downstream analysis. I would appreciate if anybody can suggest any way out. We might get some illumina sequence reads in a month or so, but I wanted to know if anything can be done with what we have now.
Thanks!
Thanks for your suggestions. I'll give it a try and use nanopore polishing tools.
We have not generated that illumina assembly. It's available from NCBI, but not the raw reads.
I have not yet tried polishing a short read assembly with long reads (and i would assume one shouldn't if they have other options).
My first suggestion would be to try contacting the author of the paper and ask them for the illumina reads.
If you really have to work with the short read assembly + nanopore reads, then i guess your goal is not improving the quality of existing sequences, but rather linking contigs / resolving repeats. I would not expect racon or nanopolish to be of much use here. But you might try, of course.
From a cursory search: Long Read Gapcloser and GMcloser seem to be built specifically for your task.