Your question doesn't really provide enough information, but maybe you're interested in the knownGenes track in a model organism, and there is already a Bioconductor package
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
# From here you can discover available 'keytypes' and 'columns'
keytypes(TxDb.Hsapiens.UCSC.hg19.knownGene)
columns(TxDb.Hsapiens.UCSC.hg19.knownGene)
# Extract all the transcript ids
txid = keys(TxDb.Hsapiens.UCSC.hg19.knownGene, "TXID")
# and get their corresponding Entrez gene ids
df = select(TxDb.Hsapiens.UCSC.hg19.knownGene, txid, "GENEID", "TXID")
leading to
head(df)
GENEID TXID
1 1 70455
2 1 70456
3 10 31944
4 100 72132
5 1000 65378
6 1000 65379
If you wanted more information about the genes, you might use library(org.Hs.eg.db)
and then
head(select(org.Hs.eg.db, df$GENEID, c("SYMBOL", "GENENAME")))
ENTREZID SYMBOL GENENAME
1 1 A1BG alpha-1-B glycoprotein
2 1 A1BG alpha-1-B glycoprotein
3 10 NAT2 N-acetyltransferase 2 (arylamine N-acetyltransferase)
4 100 ADA adenosine deaminase
5 1000 CDH2 cadherin 2, type 1, N-cadherin (neuronal)
6 1000 CDH2 cadherin 2, type 1, N-cadherin (neuronal)
Also, biomart is accessible through library(biomaRt)
. The package has a good vignette, available from the package landing page. See the introduction to Biocondcutor annotation work flows for some additional information. If you're more specific about what your needs are, then other approaches might be possible.
For more general annotations, the biomaRt package is very handy. The idea is discover the 'mart', 'dataset', 'filters' and 'attributes' available, via listMarts()
etc., and then to compose a query
library(biomaRt)
## listMarts(), listDatasets("ensembl"), etc
mart <- useMart("ensembl", "hsapiens_gene_ensembl")
filters <- "ensembl_transcript_id" # info I'll provide, see listFilters(mart)
attr <- # info I want, ?listAttributes
c("ensembl_gene_id", "ensembl_transcript_id", "ensembl_peptide_id")
values = c("ENST00000275493", "ENST00000344576") # info I have
and then the query
getBM(attr, filt, values, mart)
ensembl_gene_id ensembl_transcript_id ensembl_peptide_id
1 ENSG00000146648 ENST00000344576 ENSP00000345973
2 ENSG00000146648 ENST00000275493 ENSP00000275493
An alternative to the final line, consistent with the use of select
in other annotation resources, is
select(mart, values, attr, filters)
ensembl_gene_id ensembl_transcript_id ensembl_peptide_id
1 ENSG00000146648 ENST00000344576 ENSP00000345973
2 ENSG00000146648 ENST00000275493 ENSP00000275493
In truth I 'discovered' the relevant marts, data sets, etc., partly in R and partly by navigating the ensembl mart. Don't forget to check out the biomaRt vignette.
Hi,
I have a related question. I noticed that through biomaRt I can only access homo sapiens ensembl dataset "Homo sapiens genes (GRCh38.p2)". I also want to translate Ensembl transcript IDs into RefSeq IDs, but my Ensembl Transcript IDs are from GRCh37/hg19 built. Do you have any advice on a way to get these IDs retrieved through biomaRt like in the example above? Maybe advice on a better way to do it?
I'd appreciate your advice lots!