Hi everyone,
I am trying to get GO terms for Gencode V7 ids (for example: ENST00000358204.4). I explored many previous related biostar posts:
and many more.
from there and biomart site I got the general idea and I tried a R script:
library("biomaRt")
# define biomart object
mart <- useMart(biomart = "ensembl", dataset = "hsapiens_gene_ensembl")
# query biomart
results <- getBM(attributes = c("ensembl_gene_id", "ensembl_transcript_id", "go_id"),
filters = "ensembl_transcript_id", values = "ENST00000537767",
mart = mart)
results
I run this and I get:
ensembl_gene_id ensembl_transcript_id go_id
1 ENSG00000135269 ENST00000537767 GO:0008270
2 ENSG00000135269 ENST00000537767 GO:0046872
this was very promising, but I am working with Gencode V7 genes which has ids like ENST00000537767.1 or ENST00000358204.4
when I try to run the script with those ids, I get:
[1] ensembl_gene_id ensembl_transcript_id go_id
<0 rows> (or 0-length row.names)
I also checked:
listDatasets(ensembl)
listAttributes(ensembl)
listMarts(ennsembl)
to see if I can find any other mart or dataset, but there was no other.
I have two questions here:
- Can I use Gencode V7 genes same as the previous versions i.e. is ENST00000537767 in place of ENST00000537767.1
- Is there any way I can get those go terms for Gencode V7 ids, from biomaRT or any other methods (Perl, R) preferred
Thanks in advanced for the help.
I think you misunderstood Bert's answer. He's saying that if you use the correct Ensembl version, then you don't need to specify transcript versions (and in fact Ensembl does not use them); i.e. ENST00000537767 is the same as ENST00000537767.1.
That is exactly what I meant Neil. So, instead of using http://www.ensembl.org/biomart/martview, you should use http://apr2011.archive.ensembl.org/biomart/martview/ and you should not bother with transcript version numbers as we don't use these in BioMart. Hope this explains.
Thanks Bert for the information. I am looking since morning but I can not find how to work with the version numbers. I even just did a plain search to find if Ensembl recognize ENST00000537767.1 but it does not. However it does recognize ENST00000537767. Is there any other way you think I can get the GO information for these gene/transcripts ids.
thanks a lot for the clarification. This is exactly i was looking for.