Identifying non-paralogous protein or nucleic acid sequences without using CD-HIT
1
0
Entering edit mode
10 months ago

Could someone assist me in identifying non-paralogous protein or nucleic acid sequences without relying on CD-HIT tools? Since this web server is no longer operational, what steps may we take to finish the task?

CD-HIT • 582 views
ADD COMMENT
0
Entering edit mode

I think the other comments are valid/the right approach, but the open question here is why bother with a webserver at all? Just do a local install.

ADD REPLY
0
Entering edit mode
10 months ago
dthorbur ★ 2.6k

You could use MMSeqs2 to cluster proteins in place of CD-HIT.

However, your question is pretty vague. Are you only using one species, or clustering proteins among many species. That changes which clusters and thresholds you would use.

EDIT: I am unaware of any web based protein clustering services. Can you get access to a unix machine? That would make things a lot easier.

ADD COMMENT
0
Entering edit mode

Yes, I am working with one species and want to compare with human proteome.

ADD REPLY
0
Entering edit mode

I don't work on humans, but this seems like a resource that will already be well annotated somewhere. There is even a filter to exclude paralogous genes in biomart on Ensembl.

ADD REPLY

Login before adding your answer.

Traffic: 2573 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6