Question

Sort By Go Terms With R

1

Entering edit mode

11.1 years ago

pchiang5 ▴ 30

Dear folks,

I have a list of genes with ensembl id and GO terms matched by biomaRt. How can I obtain a specific group from the list by a GO term in R? For example, extracting "GO:0003700 sequence-specific DNA binding transcription factor" to make a new table containing all information from the original table. Thanks

r biomart go • 5.2k views

ADD COMMENT • link updated 7.6 years ago by Biostar 20 • written 11.1 years ago by pchiang5 ▴ 30

0

Entering edit mode

Sorry, I am not familiar with biomaRt, but if you can specify how your data looks like and what you would like to get, I can help with R tricks

ADD REPLY • link 11.1 years ago by Pavel Senin ★ 1.9k

0

Entering edit mode

The .csv file contains tab-separated columns (please see below). For example, I would like to make a list with only genes containing "GO:0005216". How shall I tell R to do it? Thanks. ENSMUSG00000000001 GO:0016021| GO:0005216| GO:0005244| GO:0005272
ENSMUSG00000000002 GO:0008150| GO:0005576| GO:0005575| GO:0003674
ENSMUSG00000000003 GO:0008150| GO:0005216| GO:0005524| GO:0003674

ADD REPLY • link 11.1 years ago by pchiang5 ▴ 30

0

Entering edit mode

If you've already run biomaRt to obtain a data frame, then your aim is to extract subsets of a data frame. I can post examples if this is what you want to do; have a look at ?subset and ?grep in R.

ADD REPLY • link 11.1 years ago by Neilfws 49k

score 0 · Answer 1 · 2014-01-20

0

Entering edit mode

11.1 years ago

Emily 24k

Yes, just add a GO term filter to your query. This is just 'go' as the filter, then your term(s) of interest. If you want a subset of another table, do the query without the GO filter, then the same query with the GO filter.

ADD COMMENT • link 11.1 years ago by Emily 24k

score 0 · Answer 2 · 2014-01-21

0

Entering edit mode

11.1 years ago

Pavel Senin ★ 1.9k

Here you go:

# load data frame
dat = read.table("~/tmp/1.txt")

# substitute weird | symbol
dt <- as.data.frame(
    lapply(dat,function(x) if(is.character(x)|is.factor(x)) gsub("\\|","",x) else x))

# here I find which rows contain the value "GO:0005216": i do linearize into the vector subset of data frame dt[,2:5] and 
# the trick is to move to 0 based index (R arrays indexed from 1) and to return back to 1 based to find out rows
w = (which(dt[,2:5] == "GO:0005216") - 1) %% length(dt$V1) + 1

# print the result
dt[w,]

> dt[w,]
                  V1         V2         V3         V4         V5
1 ENSMUSG00000000001 GO:0016021 GO:0005216 GO:0005244 GO:0005272
3 ENSMUSG00000000003 GO:0008150 GO:0005216 GO:0005524 GO:0003674

but, if you do know that V3 is the variable you are interested in, it is easy to query it

# by using the exact value
w = which(dt$V3 == "GO:0005216")

# or using regex
w = which(grepl("GO:0005216",dt$V3))
dt[w,]

ADD COMMENT • link 11.1 years ago by Pavel Senin ★ 1.9k

0

Entering edit mode

Thanks a million Pavel！

I tried and encountered another problem that my GO id in each row are concatenated (not separated by tab or any symbol except the vertical symbol "|"). Also, the numbers of GO id are different in each row. Thus, the search for "GO:005216" returned numeric(0). How can I transform the GO into columns and define the number of column for search?

ADD REPLY • link 11.1 years ago by pchiang5 ▴ 30

1

Entering edit mode

If I understand, there could be a problem with your file - i.e. dat=read.table("~/tmp/1.txt") doesn't work right? please check the manual of read.table how it treats separators (the separator is ‘white space’, that is one or more spaces, tabs, newlines or carriage returns) and pre-format your data before loading into R, or try to specify sep="|". You can re-format text for example in vim using substitution. I have used your data and it works as it is because all values are spaced.

ADD REPLY • link 11.1 years ago by Pavel Senin ★ 1.9k

score 0 · Answer 3 · 2014-01-21

0

Entering edit mode

11.1 years ago

pchiang5 ▴ 30

Eventually I worked it out by the grep function:

#dat is the name of my data set and c(2) is the column to look for the GO id

extractedrows <- dat[grep("GO:0006906", dat[,c(2)]), ]

Thank you all for the suggestions！

ADD COMMENT • link 11.1 years ago by pchiang5 ▴ 30