Functional clustering of proteins
1
0
Entering edit mode
17 months ago
nadavi • 0

Hey everybody,

I have a list of N proteins off NCBI's NR database, where the info I have for each protein is its name and sequence.

Is there a way I could cluster said proteins by their molecular functionality?

There's no point in re-inventing the wheel everytime, but I didn't find anything when looking for articles where people have done something similar, so any help would be appreciated.

Thanks!

functional-clustering proteins • 1.4k views
ADD COMMENT
0
Entering edit mode

If you only have a list of names (no enrichment or expression scores to compare) you can simply do an over-representation analysis with a tool like gProfiler or DAVID. DAVID is normally not recommended anymore as its database is outdated, but it's straightforward enough to worth give it a try.

ADD REPLY
0
Entering edit mode

Just to point out that while overrepresentation analysis can be useful to analyse a list, the question is about clustering which isn't the same thing.

ADD REPLY
1
Entering edit mode
17 months ago

Since most clustering algorithms operate from a pariwise distance or similarity matrix, you need to compute one such matrix that captures relevant similarities between the proteins in your list. Molecular functionality is somewhat imprecise. I generally take it to mean what's covered by the molecular function domain of the Gene Ontology which you can use to compute some similarity between proteins, for example with the R Bioconductor package GOSemSim.

ADD COMMENT
0
Entering edit mode

Yeah you nailed it with what I am attempting to do, I'll give it a shot.

Thanks a lot!

ADD REPLY

Login before adding your answer.

Traffic: 2543 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6