All human pairwise sequence identities or similarities
0
0
Entering edit mode
4.1 years ago

Hi everyone,

The human proteome according to UniprotKB contains 20,370 reviewed proteins. I would like to create a matrix of size 20,370 x 20,370 containing all protein sequence identities or similarities (ranging from 0 to 1). I would very much appreciate any hints regarding the following:

(a) Have protein sequences identities or similarities have already been pre-computed and available for users to download? I am familiar with the UniRef clusters of 100%, 90% and 50% sequence identity, however what I am interested is rather on the pairwise sequence identities, not so much necessarily on the sequence clusters.

(b) There are a number of robust tools that have already been developed to calculate sequence similarities / identities and cluster proteins e.g. MMseqs2, clustal omega or blastall. Any other good tool that you may be familiar for an all-against-all pairwise sequence similarity calculation (?) It would be great if you could share on this thread.

Any hints would be greatly appreciated.

Thanks, Sergio

protein sequence-comparison • 959 views
ADD COMMENT
0
Entering edit mode

I would like to create a matrix of size 20,370 x 20,370 containing all protein sequence identities or similarities (ranging from 0 to 1).

Not sure how you would come up with a score between 0 and 1. Proteins can be of very different sizes e.g. insulin vs titin. You could force them to all start at amino acid 1 but any identity matrix you generate would be a theoretical exercise.

ADD REPLY

Login before adding your answer.

Traffic: 2608 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6