Hey everybody,
I have a list of N proteins off NCBI's NR database, where the info I have for each protein is its name and sequence.
Is there a way I could cluster said proteins by their molecular functionality?
There's no point in re-inventing the wheel everytime, but I didn't find anything when looking for articles where people have done something similar, so any help would be appreciated.
Thanks!
If you only have a list of names (no enrichment or expression scores to compare) you can simply do an over-representation analysis with a tool like gProfiler or DAVID. DAVID is normally not recommended anymore as its database is outdated, but it's straightforward enough to worth give it a try.
Just to point out that while overrepresentation analysis can be useful to analyse a list, the question is about clustering which isn't the same thing.