Question

De Novo Metatranscriptomic Assembly Failing - Trinity, Velvet/Oases

4

Entering edit mode

13.2 years ago

Newvin ▴ 360

I'm attempting de-novo assembly of metatranscriptomic data, which is admittedly a very resource-intensive problem. I have ~206 million paired-end Illumina reads each 100bp long generated via RNA-seq on environmental samples. I am able to create assemblies using Trinity and Velvet/Oases using a small portion of the reads; however, when I attempt to assemble the metatranscriptome using the full set of reads, both programs will run for a day or so then fail while attempting to allocate memory. The server I am running on has 32 procs and 256GB of RAM. I should also mention that for Velvet/Oases, I am using K=61. I believe Trinity's K value is locked at 25.

I am rather new at this. Does anyone have of sense of how unreasonable my parameters are? Is the idea of assembling 200 million reads ludicrous? I may be able to perform a dereplication step that would reduce the number of reads to ~50 million. Does anyone have an assembly experience indicating that I might have more success with only 50 million reads?

Thanks...

assembly transcriptome trinity velvet • 5.6k views

ADD COMMENT • link updated 11.1 years ago by Dgg32 ▴ 90 • written 13.2 years ago by Newvin ▴ 360

Ram · Answer 1 · 2011-10-27

3

Entering edit mode

13.2 years ago

Jeremy Leipzig 22k

You might need to speak to Titus Brown, who has used Bloom filters to put metagenomic (perhaps not metatranscriptomic) reads into manageable piles.

http://www.google.com/search?q=titus+brown+bloom+filters

ADD COMMENT • link updated 5.3 years ago by Ram 44k • written 13.2 years ago by Jeremy Leipzig 22k

0

Entering edit mode

I'd be interested in your results with using digital normalization, http://ivory.idyll.org/blog/mar-12/diginorm-paper-posted.html. I think it might work better for metatranscriptomic data than partitioning will.

ADD REPLY • link updated 5.3 years ago by Ram 44k • written 12.8 years ago by Titus Brown ▴ 80

score 3 · Answer 2 · 2011-10-27

3

Entering edit mode

13.2 years ago

pmenzel ▴ 310

Assembly of that many reads is not unreasonable. Try SOAPdenovo for the assembly. If you filter out low abundance k-mers (e.g. with -d option of SOAPdenovo), the memory consumption would decrease.

ADD COMMENT • link 13.2 years ago by pmenzel ▴ 310

score 0 · Answer 3 · 2013-11-06

I would cluster the reads with cdhit with a high identity cutoff and put the amount of reads into the fasta headers so I can keep track of them. This step alone cuts my sequences into a half without losing a single reads (but it surely mask some heterogenity of your sequences). Then Velvet with default settings will finish the job.