How to retreive all the genes belonging to one GO term
1
0
Entering edit mode
3.8 years ago

Hi,

I'd like to retreive all the genes belonging to a GO term.

I tried : (R language)

library(org.Hs.eg.db)

xx <- as.list(org.Hs.egGO2ALLEGS)

xx is a list of GO terms, and for each GO term there's a list of the corresponding genes with an "entrezID" format.

The problem is that when I check this output on other databases (GeneCards, etc...) I can't find the same results and correspondance between the genes and the GO term that I found with "org.Hs.eg.db".

Do you know a better way? Am I missing something here?

Thanks

GO R • 1.0k views
ADD COMMENT
0
Entering edit mode
3.8 years ago

Not R. I wrote http://lindenb.github.io/jvarkit/GoUtils.html

Use GO annotation to retrieve genes associated to GO:0005216 ‘ion channel activity’

join -t $'\t' -1 1 -2 2 \
    <(java -jar dist/goutils.jar -A 'GO:0005216' | cut -f 1 | sort | uniq) \
    <(wget -q -O - "http://cvsweb.geneontology.org/cgi-bin/cvsweb.cgi/go/gene-associations/gene_association.goa_human.gz?rev=HEAD" | gunzip -c | grep -v '^!' | cut -f3,5 | uniq | LC_ALL=C sort -t $'\t' -k2,2) |\
sort -t $'\t' -k2,2 |\
grep SCN5A -A 10 -B 10
(...)
GO:0086006  SCN2B
GO:0005244  SCN3A
GO:0005248  SCN3A
GO:0005248  SCN3A
GO:0005248  SCN3B
GO:0086006  SCN3B
GO:0005248  SCN4A
GO:0005248  SCN4A
GO:0005248  SCN4B
GO:0086006  SCN4B
GO:0005244  SCN5A
GO:0005248  SCN5A
GO:0005248  SCN5A
GO:0005248  SCN5A
GO:0005248  SCN5A
GO:0005248  SCN5A
GO:0005248  SCN5A
GO:0086006  SCN5A
GO:0086060  SCN5A
GO:0086061  SCN5A
GO:0086062  SCN5A
GO:0086063  SCN5A
GO:0005248  SCN7A
GO:0005248  SCN7A
GO:0005248  SCN7A
GO:0005248  SCN7A
GO:0005248  SCN8A
GO:0005248  SCN8A
GO:0005248  SCN9A
GO:0005248  SCN9A
GO:0005248  SCN9A
GO:0005272  SCNN1A
(...)
ADD COMMENT
0
Entering edit mode

Thanks Pierre. But GO terms in the list are not GO:0005216, why? I would have expected to have only a list of genes corresponding to this GO term.

ADD REPLY
1
Entering edit mode

Thanks Pierre. But GO terms in the list are not GO:0005216, why?

because GO is a graph. All those GO-Ids are a children of GO:0005216 "ion channel activity"

e.g: "GO:0086061" is "voltage-gated sodium channel activity involved in bundle of His cell action potential"

ADD REPLY

Login before adding your answer.

Traffic: 2018 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6