Oxford Nanopore and Illumina hybrid assembly
2
4
Entering edit mode
7.8 years ago
igor 13k

Are there any de novo genome assemblers that work with both Nanopore and Illumina reads?

SPAdes can take both Nanopore and Illumina reads, but it's only for prokaryotic genomes. I haven't seen anything for eukaryotic.

All the discussion and literature that I have seen so far suggests using Nanopore long reads for assembly and then polishing with Illumina short reads. However, you need a certain level of coverage for the assembly to complete (for example, Canu recommended minimum is 20X). What if you only have 1X coverage with long reads? That will not be enough to assemble on its own, but should be much better than short reads alone. What's the appropriate approach for that situation?

nanopore ont Assembly • 7.5k views
ADD COMMENT
1
Entering edit mode

How large is your genome expected to be?

You could give SPAdes a try. As long as you are not in the "human" genome territory it may work. I recall one of the SPAdes developers writing that it could be used for larger (e.g. fungal genomes) but can't find that post/thread at the moment.

Edit: SPAdes manual refers to not using --careful option for "large or medium" eukaryotic genomes. So looks like you could certainly try it out.

ADD REPLY
0
Entering edit mode

It should be around 500 MB, so it's not too big, but certainly closer to human than bacterial size.

Good point about the --careful option, but the manual also says "SPAdes is not intended for larger genomes (e.g. mammalian size genomes)", so I am not sure which part to believe.

ADD REPLY
0
Entering edit mode

If you have some time (and I think you have the resources, if I recall from a 10x thread) go ahead and give it a try. At the most the job will fail :)

ADD REPLY
0
Entering edit mode

Good memory!

I certainly plan to give it a try. I just wanted to know if I am missing anything and to have some alternatives in case it fails.

ADD REPLY
0
Entering edit mode

can it be run with in 256 gb ram

ADD REPLY
0
Entering edit mode

Trinity, I think is the best option for nanopore reads in hydrid assembly.

ADD REPLY
0
Entering edit mode

Do you have a source for that? Because on github I find the following:

Trinity assembles transcript sequences from Illumina RNA-Seq data.

ADD REPLY
0
Entering edit mode

I should've specified it's genome assembly, not transcriptome. Trinity is for RNA-seq.

ADD REPLY
0
Entering edit mode

Oh, sorry that´s true, is for RNA-seq, what about IDBA_hybrid? You can use nanopore-reads as reference.

ADD REPLY
0
Entering edit mode

Hi there, I'm new to the subject but I will soon be facing the same interrogations. I found only SPAdes and ALLPATHS-LG for the moment that does that.

With a better coverage, what would be the best approach ? Using a pipeline to assemble de novo with Nanopore and Illumina data or assembling the genome with Nanopore data and then correct with Illumina data ? or even complete the draft genome from Illumina with Nanopore data ?

Thank you very much,

ADD REPLY
0
Entering edit mode

Nanopore still lacking performance, the ratio cost/performance remains high. I think that PacBio is the best option for long reads and to complete fragmented assemblies (from illumina). Where you from lagartija? I know your name :).

ADD REPLY
0
Entering edit mode

Actually I already have the reads by Nanopore so I can't change that. By the way, do you know what's the difference between Spades and Spades-Hybrid ? It seems that both can do hybrid assembly...

So you know my name ? You meed lagartija or my real name ? haha I'm from France. But I'm also Argentinian and Norwegian. And you ? Italian ?

ADD REPLY
0
Entering edit mode

No, is not the same, you can use 'trusted contigs' for de novo assemblies with spades, but not reads. On the other hand, spades hybrid can perform de novo assemblies from long and short reads :). I from the Congo but I live in America years ago, I know lagartijas XD.

ADD REPLY
0
Entering edit mode

AAAh I see. And how do I get the trusted contigs ? And both for Illumina and Nanopore ?

ADD REPLY
1
Entering edit mode

You can use old assemblies as trusted contigs (from the same specie and closely related), the use of not highly related genomes are not recomended (in spades), if you dont have access to old assemblies (or it does not exist) de novo and hybrid assemblie is the unique option, and yes, You can use reads from nanopore and illumina for hybrid assemblies with spades-hybrid.

ADD REPLY
0
Entering edit mode

Only if you have them from some other source (e.g. an illumina only assembly).

ADD REPLY
0
Entering edit mode

Because from what I see here Spades takes reads : http://spades.bioinf.spbau.ru/release3.10.1/manual.html

ADD REPLY
5
Entering edit mode
7.8 years ago
jblommaert92 ▴ 70

Just thought I'd add the few options I've seen:

1) OPERA-LG https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0951-y

2) PacBio reccomendations may be relevant here https://github.com/PacificBiosciences/Bioinformatics-Training/wiki/Large-Genome-Assembly-with-PacBio-Long-Reads

3) LINKS https://gigascience.biomedcentral.com/articles/10.1186/s13742-015-0076-3

4) This workflow http://biorxiv.org/content/early/2016/05/22/054783

And this other question may be useful too Gap-filling and scaffolding using PacBio reads

DISCLAIMER: I haven't tried any of these yet, but I'm also planning nanopore-illumina hybrid assembly soon

ADD COMMENT
1
Entering edit mode

Those are excellent suggestions!

I should have been looking for "scaffolding" rather than "hybrid assembly", which is probably more appropriate in my case.

ADD REPLY
2
Entering edit mode
6.6 years ago
Carambakaracho ★ 3.3k

Besides the almost obvious SPAdes I recommend looking into the MaSuRCA assembler. I had very good results for PacBio/Illumina and Nanopore/Illumina data, though my long read coverage was in all cases a little bit higher than what you describe.

BTW, SPAdes easily handles metagenome assemblies with way beyond 1 Gbps and with the latest version the error messages on memory consumption where improved and you'll find out pretty early whether it works or not. Give it a try, once it assembled, I'd even try the --careful option. It is rather depended on the available memory on your machine (should probably be 128GB or more) and the k-mer complexity of your genome than its taxonomic domain.

ADD COMMENT

Login before adding your answer.

Traffic: 1880 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6