Combined assembly analysis (short reads + long reads)
3
2
Entering edit mode
8.6 years ago
XC ▴ 20

Dear NGS Experts,

I have a question about combined genome assembly.

We have 75X Hiseq sequencing of an animal species genome (about 3Gb genome size) together with 50X Pacbio Sequel system, now, we would like to make a combined assembly analysis of these 350Gb data. Anybody knows any tools for this kind of analysis?

Many thanks.

Assembly sequencing genome • 3.7k views
ADD COMMENT
0
Entering edit mode

With 50x PacBio data you should be able assemble that on its own (provided it is good quality). Based on PacBio's recommendation that should be enough to do a good assembly. You can try to assemble the HiSeq data independently and then see if you can combine the two later.

Can you comment on what the sequel data looks like? There is a dearth of real datasets for Sequel.

ADD REPLY
0
Entering edit mode

Hi genomax2, thanks a lot for your quick answer. We have similar workflow plan. If there is a tool which can do assembly at same time, that would be great, because shorter reads can correct the errors on the long reads to make them more reliable.

We are waiting for the sequel data from sequencer, once we got them, we can try to make comment.

Thank you again.

ADD REPLY
1
Entering edit mode

FALCON is one option. I think this was used for gorilla genome recently. There are plenty of other options on the Wiki page I had linked in the previous post.
Since you are going to have plenty of PacBio data you may not need to error correct using Illumina (not finding the post from Dr. Hall from PacBio but will update if I do).
Is this a diploid genome?

ADD REPLY
0
Entering edit mode

Is there update about the Sequel data? The quality and price?

ADD REPLY
2
Entering edit mode
8.6 years ago
Rohit ★ 1.5k

From my own experience with a diploid animal genome, error correction of PacBio with Illumina takes time and resources. Proovread worked well for our data at lower coverages (15X Pacbio, 20X HiSeq) but it demands many nodes, 1800 in our case with run-time of 4 days per node - But it is worth the wait, the developer too (Thomas Hackl) is very responsive.

CANU works well for a combined approach, from what I have heard.

At 50X PacBio you could go for self-error correction (PBcR) and then use Quiver to polish the genome with PacBio data alone. In the end, you could use the HiSeq data to finish the genome with the Pilon pipeline.

ADD COMMENT
1
Entering edit mode
8.6 years ago

I recently have heard good reviews about CANU.

ADD COMMENT
0
Entering edit mode
8.6 years ago
shwethacm ▴ 240

Falcon for denovo assembly + Quiver for base error correction is a good combination. I haven't tried the other approaches, but they constantly come up when we do assemblies.

You can also do error correction of the PacBio reads using Illumina and then assemble using Celera. https://github.com/PacificBiosciences/Bioinformatics-Training/wiki/pacBioToCA Although I think the reverse (like Rohit mentioned - assemble first, then do Pileon) is a more popular choice.

Here's a question: Do you have mate pair data? Most denovo PacBio assemblers give you contigs that you can place into scaffolds if you have mate pairs.

ADD COMMENT

Login before adding your answer.

Traffic: 2122 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6