Question

Choosing de novo genome assembly

1

Entering edit mode

8.4 years ago

s.kyungyong64 ▴ 40

Hi,

I have Ren-seq Data of plant (tomato) from PacBio ( ~420 Mb in fasta) to assemble. The assembled genome is about 2.0Mb in size. I have currently tried Genious and Canu assembly. The result from Canu was better than Genious, but I think I may have to try some other software. Do you have any recommendations that might worth trying?

Thanks

RNA-Seq genome Assembly • 3.4k views

ADD COMMENT • link updated 8.4 years ago by Rox ★ 1.5k • written 8.4 years ago by s.kyungyong64 ▴ 40

0

Entering edit mode

You have a tiny amount of data, so clearly this is not the raw output of a smartcell. Can you describe it in more detail? Are these CCS reads, or consensus after doing correction, or what?

ADD REPLY • link 8.4 years ago by Brian Bushnell 20k

0

Entering edit mode

Thanks for catching that. I corrected it. It is the output of SMRT cell

ADD REPLY • link 8.4 years ago by s.kyungyong64 ▴ 40

score 2 · Answer 1 · 2017-02-24

Hi !

I have also performed genome assembly using PacBio data and Canu assembler, and I was really satisfied of it.

If you want to try something else, you can try Falcon assembler, proposed by PacificBiosciences ( https://github.com/PacificBiosciences/FALCON ). Falcon is aiming to output a diploid assembly, where heterogeneous regions of the genome are outputted in a different file. I'm just warning you that PacBio tools are actually being deeply changed (they want to leave the bas/bax//cmp.h5 files extensions to propose classic fasta/sam/bam files.

The tools from PacBio, where Falcon belong, are quiet complicated to install. The two classic way are to download from github all the dependencies by yourself (hard way), or to use they tool called pitchfork (but I won't recommend you that, PacBio engineer themselves call that "the painfull way"...).

If you want to use PacBio tools in command line, I recommend to follow theses steps I have recommended to someone else (who was struggling on installation) on github : https://github.com/PacificBiosciences/pbalign/issues/67#issuecomment-272964848

As your genome is small enough( 2Mb that's it ?), you can also try assembly through SMRT Portal using for example HGAP 3 protocol.

pbalign and quiver are very important, because with PacBio assembly, the error rate after assembly is still around 1%. You can lower this error rate using your raw reads, this step is called polishing. You can use pbalign + quiver for that.

If you have some questions about polishing or tools installation, I can help you, I've been through the same steps !

Good luck,

Roxane