Which Assembler To Use For Metagenomic Sequences?
12
9
Entering edit mode
14.7 years ago
Panos ★ 1.8k

Do you use a specific assembler that you would like to recommend? Is there a particular trend to use a specific class of assemblers (for example de Bruijn-based)? Is there an assembler that runs on 32bit OSs so that I can play in small scale (in my desktop) before going real scale (in the server)?

metagenomics assembly • 16k views
ADD COMMENT
1
Entering edit mode

you should use 'short-reads assembler', assembler in computer science has a different meaning.

ADD REPLY
0
Entering edit mode

To answer what assembler you should use, we really need more information.

What kind of data do you have ?

How many organisms/species are in the sample you want to sequence ?

What do you want to do with the resulting assemblies ?

ADD REPLY
0
Entering edit mode

To begin with, I haven't worked with genome assembly before and I'm just trying to understand how the various tools in a metagenomics workflow work...

At present, I don't have the actual data and I'm still 'playing' with simulated datasets generated by MetaSim that contain only bacterial sequences (both Sanger and 454). In the next stage I'll add some fungi, too. Regarding the number of species I've started with only 2 bacterial genomes! Last, the end target is to perform gene calling, taxonomic profiling etc

ADD REPLY
0
Entering edit mode

@Jan van Haarst Hope I did the comments as you told me! If I didn't, let me know! Thank you for your time!

ADD REPLY
0
Entering edit mode

Edit your post (there is a link to edit it) then add this information into the question.

ADD REPLY
0
Entering edit mode

Have a look at this other question.

ADD REPLY
6
Entering edit mode
14.6 years ago
Bioch'Ti ★ 1.1k

Hi Guys,

I think you can have a look to this link: http://seqanswers.com/forums/showthread.php?t=43

This an exhaustive list of Free and commercial solutions to perform NGS data assembly.

More specifically to the initial question, I agree with Eric, CLC Genomic Workbench is a very interesting integrated solution. Moreover, you can try MIRA3 (Linux, http://www.chevreux.org/mira_downloads.html).

Regards.

ADD COMMENT
4
Entering edit mode
14.7 years ago
Darked89 4.7k

Here http://bit.ly/9CLset is the last review about various nextgen assemblers.

If you plan on using Sanger sequencing and want to do some test runs then you may get real sequencing data including SCF files from: http://bit.ly/bkQCFG

Get some data for several species, run phrap or cap3 on them. To do this in GUI, use Staden or Consed. Keep in mind this route is close to being a thing of the past.

ADD COMMENT
1
Entering edit mode

Hi darked89, your first link no longer seems to work - do you remember where it was supposed to point? Thanks.

ADD REPLY
1
Entering edit mode

@Bio_X2Y: sorry about the dead link. It was most likely: Assembly algorithms for next-generation sequencing data Jason R. Miller,Sergey Korena and Granger Sutton Genomics Volume 95, Issue 6, June 2010, Pages 315-327doi:10.1016/j.ygeno.2010.03.001

ADD REPLY
0
Entering edit mode

Which route is close to being a thing of the past? Using the specific programs or the assembly itself?

ADD REPLY
0
Entering edit mode

thing getting into "outdated" zone: Sanger sequencing (SS) for large projects + pipelines used for processing such data. IMHO it is still great to do some DNA quality checking using SS before loading your DNA in Illumina/454 or for assembly finishing/improvement. The 454s or Illumina's paired reads will give you way more data.

ADD REPLY
0
Entering edit mode

Thanks for the update!

ADD REPLY
0
Entering edit mode

@darked89: thanks for the update!

ADD REPLY
4
Entering edit mode
14.7 years ago
Blackbox ▴ 40

I've tried Arachne, Newbler and WGS on 454 datasets with varying results. In metagenomes from soil with many species and relative low coverage per species you will get more of your reads assembled into contigs that in metagenomes with a low amount of different species. In such a dataset contigs tend to break on variations between the different but similar species (WGS) or include ambiguities (Newbler). Binning may improve things. I'm curious about what options others tried.

ADD COMMENT
3
Entering edit mode
14.2 years ago
Cyz70 ▴ 30

Just happened to read this thread.Comment on CLC, it is damn expensive (while there are opne source alternatives), super fast (that was amazing) and does not use quality data so far (absolutely not acceptable, yeah qualities of NGS are quite good nowadays but still), and output information is minimal...

wgs and mira are for free, I prefer mira as it is highly tunable. and if you are lucky you can get newbler for free when you use 454 technology

ADD COMMENT
0
Entering edit mode

hey, have you tried mira for metagenomic data? which kind of data did you have (Illumina, 454...)? I've used it for genome assembly but I don't know how it works with metagenomes, Thanks!

ADD REPLY
2
Entering edit mode
14.6 years ago

We are using a non-free solution, the CLC Genomic Workbench. This software has MANY capabilities. In the role you are asking about, it would easily assemble millions and millions of short reads (given enough RAM, and, of course, a 64bit system to use it). You can easily put in data from different taxa, specify different criteria for the assembly, or alternatively, use a reference genome to assemble your data on, so as to potentially get a less messy result (less influenced by sequence divergence, paralogy...).

This software could be somewhat pricey for a small lab, but in the context of a group of research, I have found that it was many times worth it's price just in time saved on student projects.

The software also has A LOT of features for biologists working with sequences. Not only assembling NGS data.

DISCLAIMER (just in case...): I am IN NO WAY connected to this company. I just happen to be a happy user :)

Cheers.

ADD COMMENT
0
Entering edit mode

You can also buy just the terminal program clc_novo_assemble. I have version 3.0.2b which has SIMD instructions meaning it's very, very fast. My only beef is that it doesn't give you any coverage information but only a fasta file.

ADD REPLY
0
Entering edit mode

(It also supports paired ends for Illumina data)

ADD REPLY
2
Entering edit mode
13.0 years ago

A "meta" version of Velvet is newly available. I did not try it yet. http://metavelvet.dna.bio.keio.ac.jp/

ADD COMMENT
2
Entering edit mode
13.0 years ago
Random ▴ 160

I never tried it, but there's also the Genovo de novo assembler, specifically designed for assembling metagenomes, which interestingly uses a bayesian approach.

They compared it against Velvet, EULER-SR, and Newbler, and it seems to have performed better.

From their abstract in their manual, which can be found on the Genovo link:

We compare the performance of Genovo to three other short read assembly programs across one synthetic dataset and eight metagenomic datasets created using the 454 platform, the largest of which has 311k reads. Genovo’s reconstructions cover more bases and recover more genes than the other methods, and yield a higher assembly

But maybe these assemblers aren't the best comparison in terms of metagenomics assembly performance.

If you try it let me know how it performs.

ADD COMMENT
2
Entering edit mode
11.3 years ago
ugly.betty77 ★ 1.1k

Metagenome assembly is different from genome or transcriptome assembly, because of the differences in counts of various samples (http://www.homolog.us/Tutorials/index.php?p=6.6&s=1).

Regarding the programs involved, most researchers I know, who are doing metagenome assembly every day, currently use Ray-Meta for its scalability to large samples. That is just an anecdotal observation and not a recommendation for one program versus another.

ADD COMMENT
1
Entering edit mode

Ray Meta is also recommended in this useful tutorial: http://perso.eleves.bretagne.ens-cachan.fr/~chikhi/2013-evomics-assembly.pdf

ADD REPLY
1
Entering edit mode
13.8 years ago
Marina Manrique ★ 1.3k

In this paper they use Newbler to assemble the reads and they even got strain specificity http://www.pnas.org/content/108/3/1128 If you're working (or plan to work) with 454 data maybe you could try first with Newbler, it's quite easy to use

ADD COMMENT
1
Entering edit mode
13.0 years ago
Urchgene ▴ 10

...The AMOS package is very useful....in their publication, the Minimo assembler pipeline can be used for metagenomics assembly.

But another package called metAMOS (https://github.com/treangen/metAMOS) heavily dependent on AMOS, SOAP,Newbler and other tools is available and looks promising.

ADD COMMENT
1
Entering edit mode
13.0 years ago

MetaIDBA works really well.

ADD COMMENT
0
Entering edit mode
14.5 years ago
Rks ▴ 30

I am also curious to know the assembly program for the metagenomic samples. Most of the program i know like euler, velvet, arachne etc are designed for assembling genome from single species. Can these genome assembler be used for metagenome assembly of illumina reads? However i know about metasim but i am not using it for now.

ADD COMMENT

Login before adding your answer.

Traffic: 2554 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6