Question

PacBio assemblies only ending up somewhere between 80 and 250 contigs.

0

Entering edit mode

6.4 years ago

dylan.lawrence ▴ 100

I have done a lot of denovo assembly with NGS data (Illumina NextSeq and MiSeq) and expect to only get a "pretty good" final assembly. However with PacBio I was under the impression this improved greatly. I'm struggling to finalize assemblies though.

Currently I have tried the following assemblers:

CANU
HGAP4/Whatever the pbsmrtpipe de novo assembly pipeline is
SOAPdenovo with hybrid mode (pacbio+illumina)

I generated my data from a multiplexed run on a PacBio Sequel machine and demulitplexed with lima.

Of the assemblies the hybrid did the best. The overrall assembly contained ~500 conitgs and was twice the expected genome size. However if I filtered out conitgs <10,000 base pairs I ended up with 80 contigs whose length is extremely close to the expected genome size.

What do I do from here? I've tried circlator which seems to only try to circularize the contigs themselves. My next step is to considered quickmerge to possibly finalize.

Has anyone else hit a similar stumbling block in trying to finish a genome using PacBio reads?

pacbio de novo Assembly • 1.6k views

ADD COMMENT • link 6.4 years ago by dylan.lawrence ▴ 100

0

Entering edit mode

What is the organism? I suppose it is a bacteria, as you were trying circlator. It is really strange the final assembly being twice the expected genome size, did you check for contaminants?

ADD REPLY • link 6.4 years ago by h.mon 35k

0

Entering edit mode

Not in depth but I have performed Illumina sequencing on this same sample and there were no contaminants.

ADD REPLY • link 6.4 years ago by dylan.lawrence ▴ 100

0

Entering edit mode

I think the problem here is that the genome is diploid or possibly polyploid. In the case of Diploid or polyploidic genomes the assembly size can be generally more than the haploid genome size, which is what OP wants.

OP, I think you can filter the same by genome vs genome alignments. The Diploidic sequences will show a pretty high identity. You can subsequently filter the same.

If you can post the parameters that you have used we can probably suggest better sets for your assembly.

ADD REPLY • link 6.3 years ago by harish ▴ 470