Hello guys. How do I download the complete data sets for protein entries containing information about the GO (such as Biological Process, Molecular Function, Cellular Component) ? I want to download all this data sets and integrate it in a MySQL db.
Furthermore, second question is that how do I complete data sets from InterPro (domain) which a contains fields about super-family, family, sub-family? Which file should I download there?
Then select a tab or comma separated download (select compressed as well for best results)
You might want to write a script to use offset and limit to page through the results as it will generate a largish files.
Unlike the answer using XML from FTP this will give all current Gene Ontology Annotations not just those made by the UniProt consortium, at the time of the UniProt release. i.e. can be a bit more information than the XML file has.
For the UniProt, how do I which one is the parent node of the ontology ?