What do you like for assembling paired-end #454 sequence data? (for 200kb assemblies)
What do you like for assembling paired-end #454 sequence data? (for 200kb assemblies)
Hi Cupton,
Regarding the adjustment of the question, I would strongly advise you to read this webpage that summarize all the available (free and commercial) tools to assemble 454 data: http://seqanswers.com/forums/showthread.php?t=43
Regards.
"Previous answer" below:
Anyway, if you are looking for an open source 454 assembler, I think that MIRA3 is the best candidate! It is running under Linux/OsX
Check: http://www.chevreux.org/projects_mira.html
Otherwise, you can have a look to Galaxy, an interesting emerging online tool: http://main.g2.bx.psu.edu/
Regards.
Generally, I'd use Newbler, which isn't open source, but comes with the 454 equipment. Some people claim Celera gives higher quality, but so far, results have been ambiguous. I'd be wary of the de Bruijn-based ones, where I have seen very mixed results.
I'd also try to verify the assembly in any way you can, mapping reads (preferably independent ones) back, mapping ESTs or BAC ends, etc, not just relying on statistics like n50.
An interesting resource on this is Nick Loman's blog entry on assembling Ion Torrent data
http://pathogenomics.bham.ac.uk/blog/2011/05/first-look-at-ion-torrent-data-de-novo-assembly/
While not quite 454, the technology is similar in many respects.
Anyway, we got better results from Celera (approximately Newbler quality) than from CLC for 454 data, but your mileage may vary.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Can you do (de novo) assembly using Galaxy? Can't seem to find it...
Galaxy is not really a tool but a collection of tools, most of which are unrelated to assembly. You should specify what tool in Galaxy do you have in mind.