Get GO terms for location and filter protein table with Rstudio
1
0
Entering edit mode
2.6 years ago
Doralicia • 0

Hello everyone,

I am new to GO database and R.

I have a data table with a protein list (gene symbols) which I got from filtering a bigger table by fold-change and pvalue. See below for the head of my table.

> head(table)
     X    log2FC       FC      pvalue      padj
45   DDX21 0.8358637 1.784925 0.021688905 0.2737480
82  PDCD11 0.7647240 1.699045 0.037086918 0.2947572
104 RSL1D1 0.7923346 1.731875 0.034387111 0.2938215
202   TBL3 0.7217412 1.649171 0.004074165 0.1638396
228   NOP2 0.8724764 1.830803 0.005316531 0.1786989

I now need to filter this table by organelle location of these proteins and by their transmembrane/or not status.

How can I get GO terms for location and transmembrane status so that I can filter my proteins? I was planning on using packages. org.Hs.eg.db and GO.db to get GO terms and add them as columns to the table to ultimately filter my proteins but what I tried is not working.

Could you please help me to find a proper code for this purpose?

Thank you!

Rstudio location GOterms transmembrane GOdatabase • 1.4k views
ADD COMMENT
1
Entering edit mode
2.6 years ago

Something like this should work:

library(org.Hs.eg.db)
library(clusterProfiler)

geneList <- c('DDX21', 'PDCD11', 'RSL1D1', 'TBL3','NOP2')

# convering gene symbol to gene ENTREZ ID
gene.df <- bitr(geneList, fromType = "SYMBOL",
                toType = c("ENSEMBL","ENTREZID" ),
                OrgDb = org.Hs.eg.db)
# GO classification,to read more about arguments used in this function please use ?groupGO to see help page
ggo <- groupGO(gene     = gene.df$ENTREZID,
               OrgDb    = org.Hs.eg.db,
               ont      = "CC",
               level    = 3,
               readable = TRUE)

ggo_df <- data.frame(ggo)
# filtering out GO terms with no intersection with gene list
ggo_df <- ggo_df[ggo_df$Count > 0,]

And result: enter image description here

ADD COMMENT
0
Entering edit mode

Thank you so much for your help, I tried reproducing your code with my whole list of proteins and got this warning:

> gene.df<-bitr(genelist, fromType= "SYMBOL", toType=c("ENSEMBL", "ENTREZID"), OrgDb = org.Hs.eg.db)
'select()' returned 1:many mapping between keys and columns
Warning message:
In bitr(genelist, fromType = "SYMBOL", toType = c("ENSEMBL", "ENTREZID"),  :
 10.71% of input gene IDs are fail to map...

Does that mean there are any incorrect gene symbols in my list?

Do you know how can I filter and remove from my table proteins which are located in one concrete organelle or cell compartment? For example, removing all proteins which are located in cytosol.

ADD REPLY
0
Entering edit mode

Does that mean there are any incorrect gene symbols in my list?

Check these posts: https://support.bioconductor.org/p/132388/ and 98.21% of input gene IDs are fail to map

....how can I filter and remove from my table proteins which are located in one concrete organelle or cell compartment? For example, removing all proteins which are located in cytosol.

Once you identified which protein is associated with let's say cytosol, then subset your data frame to remove those proteins. Steps toward that would be something like this:

# selecting items in the geneID column where Description == 'cytosol'
# drop that selection of gene symbols from the data  frame /original gene list
ADD REPLY
0
Entering edit mode

Thank you. The post you recommended was useful!

I tried what you suggested to filter my data.

Once you identified which protein is associated with let's say cytosol, then subset your data frame to remove those proteins. Steps toward that would be something like this

With this code I did select the cell compartments I wanted to remove from my table as you suggested (just changed the items in "Description").

selecting items in the geneID column where Description == 'cytosol'

 ggo_df[ggo_df$Description %in% c("mitochondrial protein complex", "extracellular space"),]
                   ID                   Description Count GeneRatio       geneID
GO:0098798 GO:0098798 mitochondrial protein complex     1      1/30       GRPEL2
GO:0005615 GO:0005615           extracellular space     2      2/30 AHCTF1/LOXL2

Now, how could I remove the genes from "geneID" column from my original data frame?

ADD REPLY
0
Entering edit mode
ggo_df <-  ggo_df[ggo_df$Description %in% c("mitochondrial protein complex", "extracellular space"),]

# to Remove genes 

toRemove_genes <- unlist(strsplit(subData$geneID, "/"))

# remove duplicates
toRemove_genes <- toRemove_genes[!duplicated(toRemove_genes)]

# Drop rows from the data table based on toRemove_genes

finalData <- data[-which(data$X %in% toRemove_genes)]
ADD REPLY

Login before adding your answer.

Traffic: 1715 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6