Could someone assist me in identifying non-paralogous protein or nucleic acid sequences without relying on CD-HIT tools? Since this web server is no longer operational, what steps may we take to finish the task?
Could someone assist me in identifying non-paralogous protein or nucleic acid sequences without relying on CD-HIT tools? Since this web server is no longer operational, what steps may we take to finish the task?
You could use MMSeqs2 to cluster proteins in place of CD-HIT.
However, your question is pretty vague. Are you only using one species, or clustering proteins among many species. That changes which clusters and thresholds you would use.
EDIT: I am unaware of any web based protein clustering services. Can you get access to a unix machine? That would make things a lot easier.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
I think the other comments are valid/the right approach, but the open question here is why bother with a webserver at all? Just do a local install.