I'd like to use the canu assembler to assembly my diploid organisms sequenced with MinION. Instead of smashing heterozygous sites to a single allele, I want canu to retain both alleles in the assembled sequenced. I am aware that I will therefore end up with a redundant assembly where each contig will be present twice, with the two alleles. The Canu documentation explains how you can set the parameters to avoid collapsing heterozygous sites for PacBio datasets:
Blockquote corOutCoverage=200 "batOptions=-dg 3 -db 3 -dr 1 -ca 500 -cp 50"
The corOutCoverage parameter takes care that more reads will be corrected, the batOptions parameter determines at which error rate Canu will try to correct and thereby smash my heteryzogous sites. I however dont see how these parameters do that and how I can adjust them for nanopore reads. My nanopore reads have 10% error rate, the assembly has 0.5% error rate. To me it seems those error rates are easily distinghuised from true variation due to heterozygosity (50% divergence). Can someone explain what the batOptions parameters do (I cant find them in the bogart ref manual)? Or can someone share experience how to use canu for diploid assembly with nanopore reads?
I would go with the recommendation in the canu documentation to phase by other means and then reassemble. I just released this pipeline to do this using whatshap. However if you don't have illumina reads, in order to call variants to be phased you might want to use something like DeepVariant seems to be the best one as of the moment