Entering edit mode
10.7 years ago
evsmithx
▴
10
I'm trying to understand the parameters for blastclust, specifically score density. Is this the raw score divided by the sequence length, or the bit score divided by the sequence length? The documentation would seem to suggest the former, but I've found some papers (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2701696/) that suggest the latter. Any help would be much appreciated!
I don't know the answer but this might help.
Disregarding the absurdity of how this works, it seems to imply that the score density should fall within these limits 0 to 3. Now we just need to think about which one of the measures make sense to be restricted to this. I think bitscore makes more sense since the raw score will be more sensitive to scoring.
Best not to use blastclust at all. For a really fast k-mer based clustering, try 'cd-hit' or 'uclust'. For a BLASTP based clustering, use 'proteinortho5.pl' or 'orthoMCL.pl'.
Might also want to look at kClust.