Tips for increasing metaSPAdes assembly speed for 7 billion reads?
0
3
Entering edit mode
7.3 years ago
anin.gregory ▴ 110

I have 7 billion paired-end reads from multiple microbiome studies that I want to run a cross-assembly across using metaSPAdes.

Background:

  • I need to use metaSPAdes
  • I have access to a 1.5TB memory node, where it can run almost indefinitely, but I have a deadline of October for the assembly to be done
  • All reads have been error-corrected using bbnorm.sh
  • I have started a cross-assembly on the 1.5TB node using the '--only-assembler' flag that has been running for 3 weeks

The current assembly has been running and for the last 1.5 weeks it has been stuck on the 'post-simplification step' of 'Running Disconnecting edges with relatively low coverage'. I have looked online to see if this is a slow step for others on the SPAdes website and different forums, but could not find any discussions about this. Have you had this problem for anyone else? Does anyone have any tips to speed up the assembly?

Thanks!

Assembly Spades MetaSpades metagenome reads • 5.0k views
ADD COMMENT
1
Entering edit mode

My recommendation would be to use Megahit in this case; it is much less resource-intensive than SPades.

If you download the latest version of BBMap, there is now a file at:

bbmap/pipelines/assemblyPipeline.sh

That shows my suggested method of preprocessing data prior to assembly. It includes various trimming, filtering, and error-correction operations to minimize the number of erroneous kmers than increase time and memory consumption of large metagenomes, so it may be helpful in this case.

ADD REPLY
0
Entering edit mode

All reads have been error-corrected using bbnorm.sh

Did you also normalize the reads?

ADD REPLY
1
Entering edit mode

No, some benchmarking we did in our lab has shown that normalization reduces our contig lengths because SPAdes as it uses differential coverage to resolve ambiguities.

ADD REPLY

Login before adding your answer.

Traffic: 1870 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6