Entering edit mode
10.5 years ago
Woa
★
2.9k
I've two sets of large number of proteins( in the order 100K) , and wish to find out unique proteins belonging to each set.
Is there any tool for doing it fast?
Thanks
The reason I would not go clustering tools is that, they cluster based on input parameters, and outputs those which did not meet the criteria as unique. Especially, when one does not know how much similar is the other organism, it is hard to put a similarity cutoff. But in contrast, BLAST computes the similarity and tables the results. So, we could cherry pick those which did not have hit as unique sequences to that particular file. If at all, the user wants, he could still use the output file generated from BLAST file and could put cutoff's and pick up hits he wanted. Nonetheless, I would be happy to hear your points also for choosing clustering techniques.
It depends what one wants. If, as the question states, one wants to find the unique proteins in the set, then the problem is to do exact clustering. Doing that is going to be very much faster than doing the NxN BLAST. I agree that if the questions are more subtle, having the NxN BLAST results to play with could be useful.
Yes, it depends on what one wants. Thanks.