I would like to share some interesting observations about Diamond's performance on a fairly powerful AMD Epyc based server comparing to an older one. Interestingly, as I increase the thread count on the new system the performance decreases and even in 32 threads mode I can't reach the speed of the older system.
version: diamond v2.0.14.152
CLI: ./diamond blastx -d NR -q N65-111_dedup.fa -o N65-111 --threads (32) (64) (188) -f 100 -c1 -b20 --un N65-dark-111.fa
Reference system: (Ubuntu 20.04.3 LTS)
- 2xAMD EPYC 7282 - 1,5 GHz / 2,8 GHz (Boost) 64 threads 384Gb RAM Total time = 5075.2s
Decent system: (Ubuntu 21.10)
2xAMD EPYC 7643 - 2,30 GHz / 3,6 GHz (Boost) 188 threads 1Tb RAM Total time = 6979.27s
2xAMD EPYC 7643 - 2,30 GHz / 3,6 GHz (Boost) 64 threads 1Tb RAM Total time = 5638.73s
2xAMD EPYC 7643 - 2,30 GHz / 3,6 GHz (Boost) 32 threads 1Tb RAM Total time = 5555.9s
Reported 33332955 pairwise alignments, 33332955 HSPs. 2305437 queries aligned.
Database: NR.dmnd (type: Diamond database, sequences: 458431797, letters: 174524903011)
N65-111_dedup.fa is a metagenomic readset with trimmed 4297335 sequences (lenghts: 80-190nt)
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
Target sequences to report alignments for: 25
In particular, the following operations slow down as the thread count increases: Masking reference, Building reference histograms, Computing alignments.
Thank you in advance for all comments and suggestions!
More detailed information in the attachment:
I would test out the run in multiprocessing mode, instead of a single 32 thread, run 4x8 thread processes (or something similar) perhaps could point towards whether it is an IO-bound problem or a thread contention of some sorts. Not that I am an expert on either issue, but would be an interesting data point to collect.