Question

To reassemble Illumina and PacBio, or just upgrade previous assembly with PacBio?

1

Entering edit mode

7.0 years ago

Morgan S. ▴ 90

Hi,

I previously assembled a fungal genome with Illumina Hi-Seq paired-end sequences. The assembly was ~ 32Mbp and was made up of ~ 400 contigs. I did not try to join the contigs into scaffolds. BUSCO determined that the assembly was ~98% complete based on the number of orthologs.

However, I just received PacBio sequences from the same isolates and want to use them improve the assembly and possibly close the genome. My question is asking if I should reassemble the Illumina reads and PacBio reads together using SPAdes or some other hybrid assembler, or if I should update the pre-existing Illumina assembly with PacBio using PBJelly?

Thanks in advance! Morgan

Assembly genome illumina pacbio • 5.6k views

ADD COMMENT • link updated 7.0 years ago by harish ▴ 470 • written 7.0 years ago by Morgan S. ▴ 90

1

Entering edit mode

I don't know if you were able to solve your problem by now, but I'll never stop to cite that article : https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5100563/

They are presenting different strategy of assembly according to their coverage and also present a tool you may give a try to : quickmerge (able to merge two different kind of assembly to improve your results).

Notify me if this informations was of any use to you :)

ADD REPLY • link 7.0 years ago by Rox ★ 1.5k

1

Entering edit mode

I'm recently also stumbled upon this quickmerge tool and and am currently applying it to my data, I must say I'm a fan in the meanwhile , it runs really fast and the results are more than OK.

ADD REPLY • link 7.0 years ago by lieven.sterck 15k

0

Entering edit mode

Glad to hear I'm not the only quickmerge fan :)

ADD REPLY • link 7.0 years ago by Rox ★ 1.5k

1

Entering edit mode

Add me to that list!

If you can hack around the scripts or compute the deltas using MUMmer 4, it becomes even more insanely fast.

On that line, you can also try CAMSA

ADD REPLY • link 6.9 years ago by harish ▴ 470

0

Entering edit mode

interesting comment harish , I haven't gotten that deep in it yet. Care to share some insgiht on how to achieve the switch to MUMmer4 ?

EDIT: is it as easy as to point it to the location of the MUMmer4 binaries?

thx for the tip, will look in to it

ADD REPLY • link 6.9 years ago by lieven.sterck 15k

0

Entering edit mode

Try everything (short summary of a much longer thing posted by @h.mon below) you can.

ADD REPLY • link 7.0 years ago by GenoMax 152k

score 2 · Answer 1 · 2018-07-19

2

Entering edit mode

7.0 years ago

h.mon 35k

edit: As soon as I posted, it occurred to me it is probably better to perform a de novo hybrid assembly, as I think this has a better chance of correcting small duplications which may have been misassembled on the Illumina-only assembly.

end of edit.

Probably depends on the coverage for both Illumina and PacBio, according to this page: Large Genome Assembly with PacBio Long Reads

algorithm suggestions

In any case, do not forget to polish your assembly after incorporating PAcBio reads:

On stuck records and indel errors; or “stop publishing bad genomes”

ADD COMMENT • link 6.2 years ago by h.mon 35k

0

Entering edit mode

To add to this, id be inclined to say hybrid reassembly as you might get some bonus short reads map that didnt before to give you confidence in some more of the pacbio basecalls

ADD REPLY • link 7.0 years ago by Joe 22k

score 1 · Answer 2 · 2018-07-20

I would just assemble the new PacBio sequences de novo, eg with Canu. I would be surprised if you didn't have 40X + coverage. The Pacbio assembly is going to be on a different planet to your existing assembly. The Illumina data can still be used to polish the assembly errors with Racon and or Pilon.

score 1 · Answer 3 · 2018-07-31

It would be better that you reassemble the data using PacBio long reads and error correct them using the Illumina Reads.

Alternatively, you can do a hybrid assembly. Since the genome that you are using is only 32Mb, you can safely use Unicycler.

What I generally do for much larger genomes is do a long read based standalone assembly, call a consensus and polish it. Then you can use depending on how fragmented your short read assembly is merging them both using quickmerge or GAM-NGS etc.

Personally I have had good results using quickmerge. If you can hack the code a bit or process the steps individually, then consider using MUMmer4.

Alternatively, use DBG2OLC on your corrected PacBio reads taking your Illumina assembly as a base.