Question

Forum:Pacific Bio Long Reads vs Illumina Short Reads

0

Entering edit mode

3.9 years ago

dk0319 ▴ 70

Is anyone who has worked with both Illumina and Pac Bio generated NGS data open to discuss their experience with the two platforms? Did you notice any clear strengths/weakness? Especially as it pertains to genome assembly and structural variant discovery and/or RNA-seq analysis (both DE and splice discovery).

Note: I am aware of the implied benefit of long vs short reads. Just really curious to hear first hand accounts

next-gen RNA-Seq alignment • 2.1k views

ADD COMMENT • link updated 21 months ago by Ram 44k • written 3.9 years ago by dk0319 ▴ 70

2

Entering edit mode

Based on experience and working on both.
Disclaimer: This is my own paper:
Structural variant calling: the long and the short of it

ADD REPLY • link 3.9 years ago by Medhat 9.8k

0

Entering edit mode

Thanks for the insights. Has anyone had experiences with Bionano's genome imaging platform? If so what did you think, did it perform better then Pac Bio in detecting genomic structural variants?

ADD REPLY • link 3.9 years ago by dk0319 ▴ 70

1

Entering edit mode

It is a cost-effective way to detect SVs, but comes with these limitations:

low accuracy breakpoint resolution
no sequence for identified insertions
You may miss identifying short SVs (it is more suited for very large SVs)
there is a shortage of opensource tools to analyze it

ADD REPLY • link 3.9 years ago by Medhat 9.8k

score 2 · Answer 1 · 2021-02-12

Results for assembling a highly repetitive 1 Gb plant genome with ~100x coverage PE Illumina data: 300 Mb assembly (1/3 of the genome)

Results for assembling the same genome with ~100x Sequel 1 PacBio reads: 950 Mb genome (~90% of the genome)

These days, I wouldn't even attempt genome assembly with Illumina data alone.

score 1 · Answer 2 · 2021-02-12

Currently, the best approach is having a mix of Illumina and PacBio (or Nanopore) sequencing. First step would be to assemble with long reads alone, or a hybrid assembly with long reads and short reads. There are very good assemblers for the long read data alone (e.g. Flye, which already performs polishing with the long read data), I don't have experience with hybrid assemblers. A second step would be one round of long read data polyshing, depending on the assembler, the improvements can be dramatic. After that, at least one round of short read polishing, to correct for the homopolymer systematic errors.

Long read data still suffers from high error rate (or, for PacBio CSS, not high, but systematic errors at homopolymers), thus assemblies with long reads alone may have a high rate of missing genes, due to frame-shifting assembly errors. As Dave Carlson already noted, the gains in contiguity and percentage of the genome recovered can be a lot higher for long read data compared to short reads, though I never observed something as dramatic as his report.