how to speed up cdhit clustering?

1

Entering edit mode

6.2 years ago

bitpir ▴ 250

I'm trying to run CDHIT to cluster ~250M of cds at nucleotide/protein levels. These are mostly NR-like sequences from NCBI. According to the paper it takes ~ 140 mins to cluster 4M seqs with 8 core. When I run the job, it took > 12 hours to process 1M seqs. I've tried increasing the #cpu to 24 but it still doesn't change the speed that much. Below are the commands that I used for running the clustering. Any help is appreciated! Thanks!

cd-hit-v4.6.8-2017-1208/cd-hit-est -I f1.nuc -o f1.nuc.out -n 10 -M 0 -T 8 -c 0.95 -r 0
cd-hit-v4.6.8-2017-1208/cd-hit -I f1.pep -o f1.pep.out -n 5 -M 0 -T 8 -c 0.95

cdhit protein clustering nucleotide clustering • 1.7k views

ADD COMMENT • link 6.2 years ago by bitpir ▴ 250

Login before adding your answer.

Similar Posts

Loading Similar Posts

Traffic: 2083 users visited in the last hour

Content Search
Users
Tags
Badges

Help About
FAQ

Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the

version 2.3.6