I am working on a mammalian genome project. We have an assembly from PacBio reads with N50 = 6.2 Mb. We would like to generate some long-range data that could be used for scaffolding. I am aware of several options, including Nanopore, Hi-C (with Dovetail or Phase Genomics), optical maps (Bionano), and synthetic long reads (10X). Can you share your experiences with these technologies? We need help choosing which one(s) to use.
I recently got back a genome from Dovetail that increased N50 from ~100kb to ~50MB. Yes, that's a 500X increase in contiguity.
N50 should not be the only stat for assessing how good a genome is. But it is still pretty impressive. Gene discovery/BUSCO stats also significantly increased.
A few caveats though:
We originally had ~100X coverage (PE and MP libraries), Dovetail sequenced a further ~105X coverage. So they doubled our coverage.
Dovetail's scaffolding algorithm is a bit of a blackbox. It is a probabilistic model that orders and orients contigs based on number of PE reads that links two scaffolds together. The inner distance between PE reads are variable, but very long, since it's based on their Chicago/Hi-C prep. The gist is that the more links, the closer the scaffolds are probably together.
The probabilistic model could possibly have trouble with orienting the contigs resulting in artificial inversions. However, this seems to not happen that much.
Good scaffolding is heavily dependent on how heterozygous your genome is. Using a tool like Redundans will help a lot.
I have also tried Nanopore for scaffolding. My impression is that unless your genome is relatively small, it's not really worth the money. There don't seem to be many mature tools for doing this either. I ended up using lastal to align contigs to Nanopore reads and scaffolded using that alignment info. You can probably use minimap2 for alignment now.