Sort By Go Terms With R
3
1
Entering edit mode
10.9 years ago
pchiang5 ▴ 30

Dear folks,

I have a list of genes with ensembl id and GO terms matched by biomaRt. How can I obtain a specific group from the list by a GO term in R? For example, extracting "GO:0003700 sequence-specific DNA binding transcription factor" to make a new table containing all information from the original table. Thanks

r biomart go • 5.1k views
ADD COMMENT
0
Entering edit mode

Sorry, I am not familiar with biomaRt, but if you can specify how your data looks like and what you would like to get, I can help with R tricks

ADD REPLY
0
Entering edit mode

The .csv file contains tab-separated columns (please see below). For example, I would like to make a list with only genes containing "GO:0005216". How shall I tell R to do it? Thanks. ENSMUSG00000000001 GO:0016021| GO:0005216| GO:0005244| GO:0005272
ENSMUSG00000000002 GO:0008150| GO:0005576| GO:0005575| GO:0003674
ENSMUSG00000000003 GO:0008150| GO:0005216| GO:0005524| GO:0003674

ADD REPLY
0
Entering edit mode

If you've already run biomaRt to obtain a data frame, then your aim is to extract subsets of a data frame. I can post examples if this is what you want to do; have a look at ?subset and ?grep in R.

ADD REPLY
0
Entering edit mode
10.9 years ago
Emily 24k

Yes, just add a GO term filter to your query. This is just 'go' as the filter, then your term(s) of interest. If you want a subset of another table, do the query without the GO filter, then the same query with the GO filter.

ADD COMMENT
0
Entering edit mode
10.8 years ago
Pavel Senin ★ 1.9k

Here you go:

# load data frame
dat = read.table("~/tmp/1.txt")

# substitute weird | symbol
dt <- as.data.frame(
    lapply(dat,function(x) if(is.character(x)|is.factor(x)) gsub("\\|","",x) else x))

# here I find which rows contain the value "GO:0005216": i do linearize into the vector subset of data frame dt[,2:5] and 
# the trick is to move to 0 based index (R arrays indexed from 1) and to return back to 1 based to find out rows
w = (which(dt[,2:5] == "GO:0005216") - 1) %% length(dt$V1) + 1

# print the result
dt[w,]

> dt[w,]
                  V1         V2         V3         V4         V5
1 ENSMUSG00000000001 GO:0016021 GO:0005216 GO:0005244 GO:0005272
3 ENSMUSG00000000003 GO:0008150 GO:0005216 GO:0005524 GO:0003674

but, if you do know that V3 is the variable you are interested in, it is easy to query it

# by using the exact value
w = which(dt$V3 == "GO:0005216")

# or using regex
w = which(grepl("GO:0005216",dt$V3))
dt[w,]
ADD COMMENT
0
Entering edit mode

Thanks a million Pavel!

I tried and encountered another problem that my GO id in each row are concatenated (not separated by tab or any symbol except the vertical symbol "|"). Also, the numbers of GO id are different in each row. Thus, the search for "GO:005216" returned numeric(0). How can I transform the GO into columns and define the number of column for search?

ADD REPLY
1
Entering edit mode

If I understand, there could be a problem with your file - i.e. dat=read.table("~/tmp/1.txt") doesn't work right? please check the manual of read.table how it treats separators (the separator is ‘white space’, that is one or more spaces, tabs, newlines or carriage returns) and pre-format your data before loading into R, or try to specify sep="|". You can re-format text for example in vim using substitution. I have used your data and it works as it is because all values are spaced.

ADD REPLY
0
Entering edit mode
10.8 years ago
pchiang5 ▴ 30

Eventually I worked it out by the grep function:

#dat is the name of my data set and c(2) is the column to look for the GO id

extractedrows <- dat[grep("GO:0006906", dat[,c(2)]), ]

Thank you all for the suggestions!

ADD COMMENT

Login before adding your answer.

Traffic: 1857 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6