Question

Tips for increasing metaSPAdes assembly speed for 7 billion reads?

3

Entering edit mode

7.3 years ago

anin.gregory ▴ 110

I have 7 billion paired-end reads from multiple microbiome studies that I want to run a cross-assembly across using metaSPAdes.

Background:

I need to use metaSPAdes
I have access to a 1.5TB memory node, where it can run almost indefinitely, but I have a deadline of October for the assembly to be done
All reads have been error-corrected using bbnorm.sh
I have started a cross-assembly on the 1.5TB node using the '--only-assembler' flag that has been running for 3 weeks

The current assembly has been running and for the last 1.5 weeks it has been stuck on the 'post-simplification step' of 'Running Disconnecting edges with relatively low coverage'. I have looked online to see if this is a slow step for others on the SPAdes website and different forums, but could not find any discussions about this. Have you had this problem for anyone else? Does anyone have any tips to speed up the assembly?

Thanks!

Assembly Spades MetaSpades metagenome reads • 5.0k views

ADD COMMENT • link 7.3 years ago by anin.gregory ▴ 110

1

Entering edit mode

My recommendation would be to use Megahit in this case; it is much less resource-intensive than SPades.

If you download the latest version of BBMap, there is now a file at:

bbmap/pipelines/assemblyPipeline.sh

That shows my suggested method of preprocessing data prior to assembly. It includes various trimming, filtering, and error-correction operations to minimize the number of erroneous kmers than increase time and memory consumption of large metagenomes, so it may be helpful in this case.

ADD REPLY • link 7.3 years ago by Brian Bushnell 20k

0

Entering edit mode

All reads have been error-corrected using bbnorm.sh

Did you also normalize the reads?

ADD REPLY • link 7.3 years ago by st.ph.n ★ 2.7k

1

Entering edit mode

No, some benchmarking we did in our lab has shown that normalization reduces our contig lengths because SPAdes as it uses differential coverage to resolve ambiguities.

ADD REPLY • link 7.3 years ago by anin.gregory ▴ 110