Question

Need pairwise protein dataset with features and GO terms

0

Entering edit mode

9.0 years ago

tyhhs1991 • 0

Currently, I'm doing a research to develop a deep learning method to predict of two proteins have similar function given the features of the two proteins.

To build this deep learning model, I need a proper dataset to train it, the requirements of the dataset are:

contains enough protein pairs with quite a number of pairwise features
each protein appears in this dataset with known GO terms(using GO terms to calculate semantic similarity of two proteins as label to train the model)

Is there any dataset can meet my demands?

What's more, now, I only found a dataset here: http://mine5.ics.uci.edu:1026/gain.html

It was generated from Lindahl's dataset with pairwise features, but without GO term annotation,

Total number of unique proteins: 976
Total number of query-template pairs: 951600

Some of the proteins' name like these

1chl-d1chl
1tnfa-d1tnfa
1eac-d1eaf
1gdha-d1gdha2
2avia-d2avia
3pgm-d3pgm
1brnl-d1brnl
1pgga-d1prha2
..

I don't what does the name format means, it's like combination of PDB and SCOP

If I use this dataset, how can I find the GO terms of each proteins in the this dataset

Thanks!

protein-function Go-terms • 2.2k views

ADD COMMENT • link updated 2.3 years ago by Ram 44k • written 9.0 years ago by tyhhs1991 • 0

0

Entering edit mode

Did you try to find orthologs of your query protein? If you find Orthologs then try to look for GO term from GOA database.

ADD REPLY • link 9.0 years ago by Pallab Bhowmick ▴ 20