Hello everyone!
I'm planning to use proovread to correct some PacBio sequences and use them to assemble a plant genome (around 400 Mbp). I currently have 30x coverage of PacBio data, 88x coverage of HiSeq2000 data and 24x coverage of MiSeq data (after quality filters and paired-end merging of Illumina sequences). The proovread manual suggests a coverage around 30-50x.
Is there any reason, aside from computational time, to use a short read coverage =<50x ? Will a higher coverage (112x) improve the results obtained from proovread?
Or is there any other hybrid method you suggest I could use to benefit from both my Illumina and PacBio data (e.g., DBG2OLC, ABySS)?
Thanks!
Thank you very much for your reply!
I think I'll try both proovread + canu and DBG2OLC to see which gives me the best results.
canu takes uncorrected pacbio reads , so no need to use proovread with canu.
The CANU documentation (release 1.3) still recommends polishing for 'best accuracy' (sic).