Get UniProt GO terms in R
1
0
Entering edit mode
3.0 years ago
Space_Life ▴ 50

Hi, I have a CSV file that has around 20,000 gene IDs and their corresponding UniprotKBs. When mapped, there was around 850 UniprotKBs distributed across 20000 genes. I am supposed to get the GO terms for these genes and plot them in R. I came across QuickGO and Uniprot REST API. Can these be used in R for my purpose? OR should I just download the GO term CSV file from Uniprot website (bulk mapping)? I would appreciate your help. Thank you.

UniProt R Gene Ontology term GO • 3.0k views
ADD COMMENT
2
Entering edit mode
3.0 years ago

There is a easy way to get your GO terms from Uniprot Id. Follow this steps

  1. Go to the UniProt retrieve/IS mapping page located at https://www.uniprot.org/uploadlists/

  2. Introduce your list of UniProt Ids

  3. Set the UniprotKB AC/ID to Uniprot KB option (the default option)

  4. Hit submit

  5. In the table that appears, select the Columns Tab to choose the fields you want to see. There is a GO Ontology section where you can choose 5 different options to see

  6. Hit Save

  7. Download the data. There are many different formats to choose

Dependingt upon the R package, you maybe have to replace the ";" and/or """ by a different value. Look for the sed program to accomplish this. Or use a program to Search/Replace such as Notepad++ to do it

ADD COMMENT
0
Entering edit mode

Thank you for the reply. This is helpful. I did that. Now I am trying to plot the GO terms in R using ggplot. Each GO(Biological process), GO(Molecular function) and GO(cellular location) column has multiple GO IDs. I just learnt the basics of ggplot and I am able to plot basic plots. Could you please help me with how can I plot to show the number of proteins for each GO term? I highly appreciate your help. Thank you.

ADD REPLY
0
Entering edit mode

The direct plotting of the GO terms is not the way to go..

You need to run an enrichment analysis that includes very likely a Fischer test to point out the enriched GOs.

The graphic you are looking is very likely provided by several of the R packages that will handle the enrichment, such as those you can see in this link GO enrichment packages in Bioconductor

ADD REPLY

Login before adding your answer.

Traffic: 1554 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6