I have Uniprot Ids of 5000 prokaryotic proteins. Would anyone suggest how to do gene ontology for those proteins as they belong to different taxa?
I have Uniprot Ids of 5000 prokaryotic proteins. Would anyone suggest how to do gene ontology for those proteins as they belong to different taxa?
You can upload your list of UniProtKB identifiers to the UniProt Batch retrieval service at https://www.uniprot.org/id-mapping Select to map from UniProtKB to UniProtKB. Once you have your results, click on "Download", select tsv (tab-separated) format and choose the columns you would like to see in your table. There are options for GO terms, GO IDs, or GO terms separated by ontology (molecular function, biological process, cellular component).
Please don't hesitate to contact the UniProt helpdesk if you have any questions on how to use this service.
something like
join -t $'\t' -1 1 -2 2 \
<(sort your.list.of.uniprot.ids) \
<(wget -O - "http://current.geneontology.org/annotations/goa_uniprot_all.gaf.gz" | gunzip -c | sort -T . -t $'\t' -k2,2)
The above is correct, but just to make this clear as it's easily misunderstood: the goa_uniprot_all.gaf does not contain all GO annotations- just the ones created/maintained by GOA, a (fantastic) team of curators at EBI. However, since the original file contains only prokaryotes and the taxons in that list are rather less likely to have a dedicated Model Organism Database (like MGI, RGD, or SGD) with a specific GAF, the GOA file is indeed probably the best source for this particular user. The goa_uniprot_all.gaf contains electronic annotations (IEAs), which I suspect would make up the bulk of annotations for most prokaryotes. The "all" in the filename is to contrast with goa_uniprot_all_noiea.gaf, which does not contain IEAs just manual annotations (from GOA).
If you can divide by taxon, there may be additional annotations in other files- for example, ecocyc.gaf.gz would be a good source for taxon:83333 (E. coli).
GO does have plans to make the annotation files specific to taxon(s), instead of the currently available files that are sorted by assigning group, but this is a complicated issue and we have no delivery date on it.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Your previous question where answered. VCF file evaluation ; T2T human genome ; Conversion of Kegg id to uniprot id
Please accept the answers so the question is marked solved on the website. To do that, click on the green check mark on the left side of the answer.