Question

Forum:What is in your opinion the current state-of-the-art sequencing platform for genomic assembly (Nov 2019)

5

Entering edit mode

5.2 years ago

Antonio R. Franco ★ 5.2k

Things are going too rapidly ....

A couple of years ago, I would have not doubts. To assemble an 1Gb genome, I would go to Illumina paired-end sequencing along an approach to get a nice scaffolding (either using a short stretch of long reads or a reference). In my opinion, long reads coming from the first generation PacBio devices were too expensive and had so poor throughput that could be used mainly for scaffolding in this range of genomic size (1Gb), substituting with notorious advantages to mate-paired sequences. Nanopore was still in strong development

But only two years later, I have changed my mind substantively. I am not longer thinking in using Illumina reads. Devices such as PacBio Sequel II can now have a throughput of hundreds of Gb with a very high quality due to their CCS (circular consensus sequencing) approach. Reads with very high quality (over Q40) are obtained. Prices went down as well, making it affordable. Nanopore has taken the same approach, and have become a strong contender.

I would like to open this forum to hear from both, your opinions and experiences

next-gen-sequencing assembly genome • 2.6k views

ADD COMMENT • link updated 19 months ago by Ram 44k • written 5.2 years ago by Antonio R. Franco ★ 5.2k

0

Entering edit mode

Talking about Hi-C..

Has anybody have something to tell about optical maps as a serious contender ?

ADD REPLY • link 5.2 years ago by Antonio R. Franco ★ 5.2k

2

Entering edit mode

Optical mapping is very much alive. I have seen some rearrangement data for cancer cells that would not be possible to obtain/visualize with any current long or short sequencing technology.

There is a big technology barrier though. You have to be able to prepare nuclei and then very high molecular weight DNA. Both of these are not trivial pursuits and require manual work/expertise. It is especially difficult for plants, where it would be most useful, but plant cell walls are a big barrier to isolation of nuclei.

If you don't have a reference or know nothing about the genome you can use optical mapping to get an idea of large scale organization to orient your sequencing data.

ADD REPLY • link 5.2 years ago by GenoMax 148k

1

Entering edit mode

was a very useful/promising approach until a few years back but has now been (nearly) completely outcompeted by Hi-C methods. Last I heard was that all the optical stuff is being phased out (especially for assembly purposes)

ADD REPLY • link 5.2 years ago by lieven.sterck 15k

0

Entering edit mode

According this article it seems that the optical map approach is very much alive. Wondering for more commentaries. Maybe Hi-C is far more affordable from every perspective..

ADD REPLY • link 5.2 years ago by Antonio R. Franco ★ 5.2k

1

Entering edit mode

As they apparently developed a new approach , it would be weird for them to state that the technique is not much used anymore ;) .

As far as I remember the main issue with the optical maps is the it is far from straightforward to create them (despite what others/companies want you to believe) . Which actually also goes for the other map techniques (physical, genetic, ...), when they are around they will most likely be used to increase scaffolding but those are not frequently being created specifically to assists in assembly (in contrast to Hi-C, which is)

ADD REPLY • link 5.2 years ago by lieven.sterck 15k

1

Entering edit mode

Not really with the previous machines (Iris?), but it sounds that the new machines (Saphyr?) could be.

ADD REPLY • link 5.2 years ago by Juke34 9.0k

score 7 · Answer 1 · 2019-10-31

7

Entering edit mode

5.2 years ago

WouterDeCoster 47k

From what I've seen the best assemblies are obtained by long read sequencing with Hi-C for long range scaffolding and short read sequencing for polishing. The last one may not be necessary in the future with improving accuracy of long reads.

ADD COMMENT • link 5.2 years ago by WouterDeCoster 47k

0

Entering edit mode

According what I am reading, short read sequencing could not be longer required at present

ADD REPLY • link 5.2 years ago by Antonio R. Franco ★ 5.2k

2

Entering edit mode

Short reads are definitely still required. Else artificial frameshifts can be found everywhere.

ADD REPLY • link 5.2 years ago by colindaven 7.0k

score 5 · Answer 2 · 2019-10-31

5

Entering edit mode

5.2 years ago

Juke34 9.0k

In our sequencing platform NGI Sweden,
the recommendation for de novo genome sequencing was in the past:
Illumina 50x sequencing on HiSeqX or NovaSeq, several insert sizes (+ Mate Pairs)

and is nowadays:
100x PacBio (ONT) only + Hi-C (coverage depends on heterozygocity). + RNA-seq data for annotation.
Having short reads for polishing is encouraged for low coverage in long reads.

ADD COMMENT • link 5.2 years ago by Juke34 9.0k

1

Entering edit mode

Having short reads for polishing is encouraged for low coverage in long reads.

Are you using HiFi or Long Reads sequencing ?

If using HiFi... How the cycling redundancy in the sequencing reduces the truly useful coverage for the assembly ?. I mean that if Sequel II has an throughput of say 100Gb, most of them are used for redundancy to only improve quality, so this must lead to an reduction in coverage

If using HiFi.. do you still have issues with homopolymers ?

ADD REPLY • link 5.2 years ago by Antonio R. Franco ★ 5.2k

0

Entering edit mode

Would you mind commenting on your assembly pipeline?

ADD REPLY • link 5.2 years ago by Mark ★ 1.6k

1

Entering edit mode

FALCON-Phase basicaly

ADD REPLY • link 5.2 years ago by Juke34 9.0k

score 4 · Answer 3 · 2019-10-31

edit: I moved my post from comment to answer, and added the link to a blog post.

According what I am reading, short read sequencing could not be longer required at present

At the current stage, this is not true. While one can get "99.9%" base accuracy with some clever techniques (e.g. PacBio circular consensus), PacBio and Nanopore still have systematic deletions on homopolymers which can't be corrected with long reads alone. Even a low rate of deletions - say, one in one thousand - will still cause a lot of trouble to gene prediction, due to frame-shifts and false stop-codons.

This point has been made some times (e.g. On stuck records and indel errors; or “stop publishing bad genomes”), and I think even a couple of papers have been published highlighting the issue.

score 0 · Answer 4 · 2019-11-02

0

Entering edit mode

5.2 years ago

Antonio R. Franco ★ 5.2k

A nice revision about modern scaffolding procedures https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1006994&type=printable

ADD COMMENT • link 5.2 years ago by Antonio R. Franco ★ 5.2k