Question

What is the appropriate assembler for PacBio long reads

0

Entering edit mode

6.3 years ago

bioinforesearchquestions ▴ 370

Hi folks,

We got long reads sequenced from 10 bacteria using Pac Bio sequencing platform. 5 of them don't have reference bacterial strains and 5 of them have some bacterial strain closer to the subject.

I have to identify anti microbial resistant genes from these 10 bacteria. This is the first time, I am handling PacBio sequence.

Any assembler to handle long reads?As of now don't know the coverage of the sample. Guide me through a reference article if you have encountered for this requirement. I found HGAP from PacBio sequencing platform (https://github.com/PacificBiosciences/Bioinformatics-Training/wiki/HGAP#implementations).

Celera® Assembler link is broken

Assembly alignment • 5.6k views

ADD COMMENT • link updated 6.3 years ago by gconcepcion ▴ 410 • written 6.3 years ago by bioinforesearchquestions ▴ 370

3

Entering edit mode

Check this recent review (Table 2 has lists of lots of useful programs).

ADD REPLY • link 6.3 years ago by GenoMax 147k

1

Entering edit mode

There are numerous long-read assemblers available. Many listed here.

https://academic.oup.com/bib/advance-article/doi/10.1093/bib/bbx147/4590140

ADD REPLY • link 6.3 years ago by Andy ▴ 20

0

Entering edit mode

Do you only have PacBio data? You should get some Illumina:

On stuck records and indel errors; or “stop publishing bad genomes”

ADD REPLY • link 6.3 years ago by h.mon 35k

0

Entering edit mode

As of now, I have been told that I am going to get only the PacBio long reads. Why do you say that I should get some Illumina?

ADD REPLY • link 6.3 years ago by bioinforesearchquestions ▴ 370

0

Entering edit mode

From the blog post I linked:

If you can’t be bothered reading, then the summary is:

BOTH single molecule sequencing technologies (PacBio and Nanopore), their major error mode is insertions / deletions

Once a genome is assembled, some of these errors remain in the assembly

If they are uncorrected, they inevitably cause a frameshift or premature stop codon in protein-coding regions

It’s not that you can’t correct these errors, it’s that mostly, outside of the top assembly groups in the world, people don’t

PacBio and Nanopore have insertions / deletions as main error, Illumina doesn't have many insertions / deletions, so you can correct PacBio errors using Illumina reads.

ADD REPLY • link 6.3 years ago by h.mon 35k

score 2 · Answer 1 · 2018-08-02

2

Entering edit mode

6.3 years ago

gconcepcion ▴ 410

Your best bets are:

HGAP4 (GUI) as a pipeline provided in SMRTLink: https://www.pacb.com/support/software-downloads/

FALCON (command line) (bleeding edge HGAP): http://pb-falcon.readthedocs.io/en/latest/quick_start.html#quick-start

or Canu (command line) basically new Celera Assembler: https://canu.readthedocs.io/en/latest/quick-start.html https://github.com/marbl/canu

ADD COMMENT • link 6.3 years ago by gconcepcion ▴ 410

1

Entering edit mode

I'd stay away from PacBio based assemblers - they're pretty difficult to get to work and take FOREVER. Use a third party assembler, like CANU.

ADD REPLY • link 5.0 years ago by andorjkiss ▴ 50