Question

Which Assembler To Use For Metagenomic Sequences?

9

Entering edit mode

14.8 years ago

Panos ★ 1.8k

Do you use a specific assembler that you would like to recommend? Is there a particular trend to use a specific class of assemblers (for example de Bruijn-based)? Is there an assembler that runs on 32bit OSs so that I can play in small scale (in my desktop) before going real scale (in the server)?

metagenomics assembly • 16k views

ADD COMMENT • link updated 11.4 years ago by ugly.betty77 ★ 1.1k • written 14.8 years ago by Panos ★ 1.8k

1

Entering edit mode

you should use 'short-reads assembler', assembler in computer science has a different meaning.

ADD REPLY • link 14.8 years ago by Giovanni M Dall'Olio 28k

0

Entering edit mode

To answer what assembler you should use, we really need more information.

What kind of data do you have ?

How many organisms/species are in the sample you want to sequence ?

What do you want to do with the resulting assemblies ?

ADD REPLY • link 14.8 years ago by Panos ★ 1.8k

0

Entering edit mode

To begin with, I haven't worked with genome assembly before and I'm just trying to understand how the various tools in a metagenomics workflow work...

At present, I don't have the actual data and I'm still 'playing' with simulated datasets generated by MetaSim that contain only bacterial sequences (both Sanger and 454). In the next stage I'll add some fungi, too. Regarding the number of species I've started with only 2 bacterial genomes! Last, the end target is to perform gene calling, taxonomic profiling etc

ADD REPLY • link 14.8 years ago by Panos ★ 1.8k

0

Entering edit mode

@Jan van Haarst Hope I did the comments as you told me! If I didn't, let me know! Thank you for your time!

ADD REPLY • link 14.8 years ago by Panos ★ 1.8k

0

Entering edit mode

Edit your post (there is a link to edit it) then add this information into the question.

ADD REPLY • link 14.8 years ago by Istvan Albert 102k

0

Entering edit mode

Have a look at this other question.

ADD REPLY • link updated 5.3 years ago by Ram 44k • written 14.8 years ago by Giovanni M Dall'Olio 28k

score 6 · Answer 1 · 2010-04-15

Hi Guys,

I think you can have a look to this link: http://seqanswers.com/forums/showthread.php?t=43

This an exhaustive list of Free and commercial solutions to perform NGS data assembly.

More specifically to the initial question, I agree with Eric, CLC Genomic Workbench is a very interesting integrated solution. Moreover, you can try MIRA3 (Linux, http://www.chevreux.org/mira_downloads.html).

Regards.

score 4 · Answer 2 · 2010-04-01

4

Entering edit mode

14.7 years ago

Darked89 4.7k

Here http://bit.ly/9CLset is the last review about various nextgen assemblers.

If you plan on using Sanger sequencing and want to do some test runs then you may get real sequencing data including SCF files from: http://bit.ly/bkQCFG

Get some data for several species, run phrap or cap3 on them. To do this in GUI, use Staden or Consed. Keep in mind this route is close to being a thing of the past.

ADD COMMENT • link 14.7 years ago by Darked89 4.7k

1

Entering edit mode

Hi darked89, your first link no longer seems to work - do you remember where it was supposed to point? Thanks.

ADD REPLY • link 14.1 years ago by Bio_X2Y ★ 4.4k

1

Entering edit mode

@Bio_X2Y: sorry about the dead link. It was most likely: Assembly algorithms for next-generation sequencing data Jason R. Miller,Sergey Korena and Granger Sutton Genomics Volume 95, Issue 6, June 2010, Pages 315-327doi:10.1016/j.ygeno.2010.03.001

ADD REPLY • link 14.1 years ago by Darked89 4.7k

0

Entering edit mode

Which route is close to being a thing of the past? Using the specific programs or the assembly itself?

ADD REPLY • link 14.7 years ago by Panos ★ 1.8k

0

Entering edit mode

thing getting into "outdated" zone: Sanger sequencing (SS) for large projects + pipelines used for processing such data. IMHO it is still great to do some DNA quality checking using SS before loading your DNA in Illumina/454 or for assembly finishing/improvement. The 454s or Illumina's paired reads will give you way more data.

ADD REPLY • link 14.7 years ago by Darked89 4.7k

0

Entering edit mode

Thanks for the update!

ADD REPLY • link 14.1 years ago by Bio_X2Y ★ 4.4k

0

Entering edit mode

@darked89: thanks for the update!

ADD REPLY • link 14.1 years ago by Bio_X2Y ★ 4.4k

score 4 · Answer 3 · 2010-04-06

I've tried Arachne, Newbler and WGS on 454 datasets with varying results. In metagenomes from soil with many species and relative low coverage per species you will get more of your reads assembled into contigs that in metagenomes with a low amount of different species. In such a dataset contigs tend to break on variations between the different but similar species (WGS) or include ambiguities (Newbler). Binning may improve things. I'm curious about what options others tried.

score 3 · Answer 4 · 2010-09-14

3

Entering edit mode

14.3 years ago

Cyz70 ▴ 30

Just happened to read this thread.Comment on CLC, it is damn expensive (while there are opne source alternatives), super fast (that was amazing) and does not use quality data so far (absolutely not acceptable, yeah qualities of NGS are quite good nowadays but still), and output information is minimal...

wgs and mira are for free, I prefer mira as it is highly tunable. and if you are lucky you can get newbler for free when you use 454 technology

ADD COMMENT • link 14.3 years ago by Cyz70 ▴ 30

0

Entering edit mode

hey, have you tried mira for metagenomic data? which kind of data did you have (Illumina, 454...)? I've used it for genome assembly but I don't know how it works with metagenomes, Thanks!

ADD REPLY • link 13.9 years ago by Marina Manrique ★ 1.3k

score 2 · Answer 5 · 2010-04-08

2

Entering edit mode

14.7 years ago

Eric Normandeau 11k

We are using a non-free solution, the CLC Genomic Workbench. This software has MANY capabilities. In the role you are asking about, it would easily assemble millions and millions of short reads (given enough RAM, and, of course, a 64bit system to use it). You can easily put in data from different taxa, specify different criteria for the assembly, or alternatively, use a reference genome to assemble your data on, so as to potentially get a less messy result (less influenced by sequence divergence, paralogy...).

This software could be somewhat pricey for a small lab, but in the context of a group of research, I have found that it was many times worth it's price just in time saved on student projects.

The software also has A LOT of features for biologists working with sequences. Not only assembling NGS data.

DISCLAIMER (just in case...): I am IN NO WAY connected to this company. I just happen to be a happy user :)

Cheers.

ADD COMMENT • link 14.7 years ago by Eric Normandeau 11k

0

Entering edit mode

You can also buy just the terminal program clc_novo_assemble. I have version 3.0.2b which has SIMD instructions meaning it's very, very fast. My only beef is that it doesn't give you any coverage information but only a fasta file.

ADD REPLY • link 14.6 years ago by Science_Robot ★ 1.1k

0

Entering edit mode

(It also supports paired ends for Illumina data)

ADD REPLY • link 14.6 years ago by Science_Robot ★ 1.1k

score 2 · Answer 6 · 2011-11-24

2

Entering edit mode

13.1 years ago

Manu Prestat 4.1k

A "meta" version of Velvet is newly available. I did not try it yet. http://metavelvet.dna.bio.keio.ac.jp/

ADD COMMENT • link 13.1 years ago by Manu Prestat 4.1k

Ram · Answer 7 · 2011-11-27

I never tried it, but there's also the Genovo de novo assembler, specifically designed for assembling metagenomes, which interestingly uses a bayesian approach.

They compared it against Velvet, EULER-SR, and Newbler, and it seems to have performed better.

From their abstract in their manual, which can be found on the Genovo link:

We compare the performance of Genovo to three other short read assembly programs across one synthetic dataset and eight metagenomic datasets created using the 454 platform, the largest of which has 311k reads. Genovo’s reconstructions cover more bases and recover more genes than the other methods, and yield a higher assembly

But maybe these assemblers aren't the best comparison in terms of metagenomics assembly performance.

If you try it let me know how it performs.

score 2 · Answer 8 · 2013-08-13

2

Entering edit mode

11.4 years ago

ugly.betty77 ★ 1.1k

Metagenome assembly is different from genome or transcriptome assembly, because of the differences in counts of various samples (http://www.homolog.us/Tutorials/index.php?p=6.6&s=1).

Regarding the programs involved, most researchers I know, who are doing metagenome assembly every day, currently use Ray-Meta for its scalability to large samples. That is just an anecdotal observation and not a recommendation for one program versus another.

ADD COMMENT • link 11.4 years ago by ugly.betty77 ★ 1.1k

1

Entering edit mode

Ray Meta is also recommended in this useful tutorial: http://perso.eleves.bretagne.ens-cachan.fr/~chikhi/2013-evomics-assembly.pdf

ADD REPLY • link 11.4 years ago by Mikael Huss 4.8k

score 1 · Answer 9 · 2011-02-08

1

Entering edit mode

13.9 years ago

Marina Manrique ★ 1.3k

In this paper they use Newbler to assemble the reads and they even got strain specificity http://www.pnas.org/content/108/3/1128 If you're working (or plan to work) with 454 data maybe you could try first with Newbler, it's quite easy to use

ADD COMMENT • link 13.9 years ago by Marina Manrique ★ 1.3k

score 1 · Answer 10 · 2011-11-24

1

Entering edit mode

13.1 years ago

Urchgene ▴ 10

...The AMOS package is very useful....in their publication, the Minimo assembler pipeline can be used for metagenomics assembly.

But another package called metAMOS (https://github.com/treangen/metAMOS) heavily dependent on AMOS, SOAP,Newbler and other tools is available and looks promising.

ADD COMMENT • link 13.1 years ago by Urchgene ▴ 10

score 1 · Answer 11 · 2011-11-24

1

Entering edit mode

13.1 years ago

Martin A Hansen 3.0k

MetaIDBA works really well.

ADD COMMENT • link 13.1 years ago by Martin A Hansen 3.0k

score 0 · Answer 12 · 2010-05-17

I am also curious to know the assembly program for the metagenomic samples. Most of the program i know like euler, velvet, arachne etc are designed for assembling genome from single species. Can these genome assembler be used for metagenome assembly of illumina reads? However i know about metasim but i am not using it for now.