For high quality Telomere to telomere assemblies, is short read polishing still necessary
3
0
Entering edit mode
14 months ago
Mark ▴ 20

I'm new to DNA sequencing but our lab has bought a lot of nanopore equipment to sequence algae samples de novo. I'm not caught up on the state of DNA assembly software literature but I know that nanopore assemblies are plagued by homopolymer indel errors. Are the current assembly softwares/pipelines robust enough to mitigate homopolymer errors assuming you have a read depth of around 50X across the genome? Also I'm using R10 chemistry if that also makes a difference.

Thanks.

genome illumina sequencing nanopore wgs • 1.3k views
ADD COMMENT
2
Entering edit mode
14 months ago

Using Illumina to polish with tools like hypo and racon is still standard practice. Also use Medaka for initial long read polishing.

That said, ONT data has got massively better in the last two years and duplex will be another big advance.

ADD COMMENT
1
Entering edit mode
14 months ago
LauferVA 4.5k

In short, yes; Best practices for generation of T2T diploid genome assemblies presently employ HiFi and ONT-UL reads for building the genomic basis, then polish for instance using parental short-read data, Hi-C, or Strand-Seq (for phasing and repeat regions in the resulting graph). Verkko and Hifiasm are both achieving ~Q60 solely using these recipe, with a few assembly errors. Polishing utilizes shorter but more accurate read sets, in addition to the use of long-reads.

However, HPRC (formerly T2T) and other production consortia (VGP, ERGA) are moving away from the use of linked-reads and optical mapping. At this point HiFi or ONT R10 Duplex reads give better resolution than linked-reads, which also are reported to have less even coverage. Optical maps, Bionano for instance, have far fewer sites for nicking enzymes in centromeric regions, but do apparently still have some utility elsewhere (e.g. telomeric regions), where ONT UL read drops off anyway.

If things keep going like this, short read and optical genome mapping will have increasingly limited utility in another few years time.

ADD COMMENT
1
Entering edit mode
14 months ago
Buffo ★ 2.4k

In my experience, it mainly depends on three variables.

First, is the sequencing technology; PacBio HiFi can produce high-quality reads, the same quality as Illumina in some cases. MinION is still a different story, I've attended a couple of workshops recently, and it seems like the recent chemistry has substantially improved. But not enough for de novo assemblies without the need for polishing.

Second, the coverage. In my experience, you might not find any difference in assembling 50X PacBio Hifi versus the polished version (bacterial, or short genomes). It is different for minION, for the reasons listed above.

Third, consider genome size and complexity. For instance, it's not the same to sequence any chromosome of the human genome as it is to sequence chromosome X or Y, which are full of repeats and have a very low %G+C. You'd need higher coverage and longer reads for those. So, yes, size (and complexity) matters.

I wouldn't say there is a general rule, nor it is a "best practice" to do it by default. Every case is different and is worth it to analyze them independently.

Hope it helps.

ADD COMMENT
0
Entering edit mode

i agree with much of this, but not with the last sentence. best practices are emerging here, and they are being used for consortium-scale data generation now.

ADD REPLY
0
Entering edit mode

Best practices also include optimization of the resources (or it should, in my opinion). Consortium-scale is something different, and it might not be the case here, Mark is asking for a single de novo assembly.

ADD REPLY

Login before adding your answer.

Traffic: 2579 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6