Hi, I have a CSV file that has around 20,000 gene IDs and their corresponding UniprotKBs. When mapped, there was around 850 UniprotKBs distributed across 20000 genes. I am supposed to get the GO terms for these genes and plot them in R. I came across QuickGO and Uniprot REST API. Can these be used in R for my purpose? OR should I just download the GO term CSV file from Uniprot website (bulk mapping)? I would appreciate your help. Thank you.
Thank you for the reply. This is helpful. I did that. Now I am trying to plot the GO terms in R using ggplot. Each GO(Biological process), GO(Molecular function) and GO(cellular location) column has multiple GO IDs. I just learnt the basics of ggplot and I am able to plot basic plots. Could you please help me with how can I plot to show the number of proteins for each GO term? I highly appreciate your help. Thank you.
The direct plotting of the GO terms is not the way to go..
You need to run an enrichment analysis that includes very likely a Fischer test to point out the enriched GOs.
The graphic you are looking is very likely provided by several of the R packages that will handle the enrichment, such as those you can see in this link GO enrichment packages in Bioconductor