I'm attempting de-novo assembly of metatranscriptomic data, which is admittedly a very resource-intensive problem. I have ~206 million paired-end Illumina reads each 100bp long generated via RNA-seq on environmental samples. I am able to create assemblies using Trinity and Velvet/Oases using a small portion of the reads; however, when I attempt to assemble the metatranscriptome using the full set of reads, both programs will run for a day or so then fail while attempting to allocate memory. The server I am running on has 32 procs and 256GB of RAM. I should also mention that for Velvet/Oases, I am using K=61. I believe Trinity's K value is locked at 25.
I am rather new at this. Does anyone have of sense of how unreasonable my parameters are? Is the idea of assembling 200 million reads ludicrous? I may be able to perform a dereplication step that would reduce the number of reads to ~50 million. Does anyone have an assembly experience indicating that I might have more success with only 50 million reads?
Thanks...
I'd be interested in your results with using digital normalization, http://ivory.idyll.org/blog/mar-12/diginorm-paper-posted.html. I think it might work better for metatranscriptomic data than partitioning will.