Entering edit mode
13 months ago
Mery
•
0
Hi, I need to make protein database, but I have 0 expericence in it. Can you recommend me a course? Or maybe also a paper? I have found something myself, but also wated to aske some with a bigger experience. I want to learn how to built a database, cluster proteins and set thresholds. Best regards
You'll have to give more details about the database to get a detailed answer. What is the database going to be used for? How is it different from using the existing large curated proteins databases out there?
Regarding clustering of proteins, I would look into using MMseqs2, it's probably the current gold standard for clustering sequences. If you want to cluster proteins by 3D structure, then DaliLite.v5 is a good place to start.
Generally agree with this comment. However, DaliLite is fairly slow for modern structure databases that include millions of predicted structures. There are faster options with similar sensitivity.
You're right, FoldSeek is a much better tool. I had forgotten they included a cluster function - I've only ever used it to search databases so it skipped my mind.