Hello. I am trying to cluster a huge fasta file containing using CD-HIT-EST with a threshold of 80%. According to the user's guide (http://www.bioinformatics.org/cd-hit/cd-hit-user-guide.pdf), I should use a word size (- n) of 5. However, it is taking forever. Could I change this parameter to -n 10 to speed up the process without changes in the final result, i. e., get the same result as -n 5?
This is my command:
cd-hit-est -i input -o output -d 0 -T 16 -g 0 -M 75000 -aL 0.97 -aS 0.97 -c 0.8 -n 5 -b 1
Okay. Thank you for your help!