Forum:Pacific Bio Long Reads vs Illumina Short Reads
2
0
Entering edit mode
3.9 years ago
dk0319 ▴ 70

Is anyone who has worked with both Illumina and Pac Bio generated NGS data open to discuss their experience with the two platforms? Did you notice any clear strengths/weakness? Especially as it pertains to genome assembly and structural variant discovery and/or RNA-seq analysis (both DE and splice discovery).

Note: I am aware of the implied benefit of long vs short reads. Just really curious to hear first hand accounts

next-gen RNA-Seq alignment • 2.1k views
ADD COMMENT
2
Entering edit mode

Based on experience and working on both.
Disclaimer: This is my own paper:
Structural variant calling: the long and the short of it

ADD REPLY
0
Entering edit mode

Thanks for the insights. Has anyone had experiences with Bionano's genome imaging platform? If so what did you think, did it perform better then Pac Bio in detecting genomic structural variants?

ADD REPLY
1
Entering edit mode

It is a cost-effective way to detect SVs, but comes with these limitations:

  • low accuracy breakpoint resolution
  • no sequence for identified insertions
  • You may miss identifying short SVs (it is more suited for very large SVs)
  • there is a shortage of opensource tools to analyze it
ADD REPLY
2
Entering edit mode
3.9 years ago
Dave Carlson ★ 2.1k

Results for assembling a highly repetitive 1 Gb plant genome with ~100x coverage PE Illumina data: 300 Mb assembly (1/3 of the genome)

Results for assembling the same genome with ~100x Sequel 1 PacBio reads: 950 Mb genome (~90% of the genome)

These days, I wouldn't even attempt genome assembly with Illumina data alone.

ADD COMMENT
1
Entering edit mode
3.9 years ago
h.mon 35k

Currently, the best approach is having a mix of Illumina and PacBio (or Nanopore) sequencing. First step would be to assemble with long reads alone, or a hybrid assembly with long reads and short reads. There are very good assemblers for the long read data alone (e.g. Flye, which already performs polishing with the long read data), I don't have experience with hybrid assemblers. A second step would be one round of long read data polyshing, depending on the assembler, the improvements can be dramatic. After that, at least one round of short read polishing, to correct for the homopolymer systematic errors.

Long read data still suffers from high error rate (or, for PacBio CSS, not high, but systematic errors at homopolymers), thus assemblies with long reads alone may have a high rate of missing genes, due to frame-shifting assembly errors. As Dave Carlson already noted, the gains in contiguity and percentage of the genome recovered can be a lot higher for long read data compared to short reads, though I never observed something as dramatic as his report.

ADD COMMENT

Login before adding your answer.

Traffic: 1924 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6