Has anyone looked into the parallelization efficiency or the maximum memory usage for the Abyss or Allpaths-LG denovo assemblers? I'd like to access a computing resource and I need to gather this information for the proposal. Its difficult to estimate due to the fairly large memory requirements to assemble even a medium-sized dataset.
The resources required for de novo assembly is dependent upon the genome size and nature of repeats in the species you will be sequencing. It is difficult to provide a relevant answer without knowing something about the species' genomes. Is it a mammal, plant, bacteria, or some other group?
Large Eukaryote genome. Mostly I'm interested in how the efficiency of Abyss (for example) scales with multiple processors