i have genomic coordinates from hg18 build and want to get the gene symbols. I have tried biomart and it only has the hg19 option. Is there any other quick and easy way? I have ~8000 ranges across all the chromosomes.
i have genomic coordinates from hg18 build and want to get the gene symbols. I have tried biomart and it only has the hg19 option. Is there any other quick and easy way? I have ~8000 ranges across all the chromosomes.
Here's one way to do it with the UCSC Genome Browser, I think.
Assuming a bash
shell, define some parameters:
$ CHR="chr1"
$ START=11000000
$ STOP=12000000
To get the first 10 gene symbols for hg18
within this range:
$ mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -N -e \
"SELECT kg.chrom, kg.txStart, kg.txEnd, x.geneSymbol \
FROM knownGene kg, kgXref x \
WHERE kg.chrom LIKE '${CHR}' AND kg.txStart >= ${START} AND kg.txEnd < ${STOP} \
GROUP BY(x.geneSymbol) \
LIMIT 10;" hg18
+------+----------+----------+-----------+
| chr1 | 11009166 | 11029872 | 16G2 |
| chr1 | 11009166 | 11029872 | 214K23.2 |
| chr1 | 11009166 | 11029872 | 44050 |
| chr1 | 11009166 | 11029872 | 5-OPase |
| chr1 | 11009166 | 11029872 | 6a8b |
| chr1 | 11009166 | 11029872 | A121/SUI1 |
| chr1 | 11009166 | 11029872 | A18hnRNP |
| chr1 | 11009166 | 11029872 | A1BG |
| chr1 | 11009166 | 11029872 | A1CF |
| chr1 | 11009166 | 11029872 | A26B1 |
+------+----------+----------+-----------+
If you are an R-person, I suggest trying Bioconductor's CNTools
:
http://www.bioconductor.org/packages/release/bioc/html/CNTools.html
It is pretty convenient and the how-to document on the page above really helps in terms of getting started.
if you list your region like
1:9330001:9395000
1:149242001:149250000
1:171936001:171971000
1:174059001:174143000
1:219914001:227775000
you can use the following code to get the corresponding gene symbols.
rm(list = ls())
your_region = read.table("yourtable.csv")
library("biomaRt")
ensembl = useMart("ensembl",dataset="hsapiens_gene_ensembl")
ensembl54=useMart("ENSEMBL_MART_ENSEMBL", host="jan2013.archive.ensembl.org/biomart/martservice/", dataset="hsapiens_gene_ensembl")
chr.region = as.matrix(your_region$V1)
entrez.ids=vector()
entrez.count=vector()
all.results=data.frame()
for (cnt in 1:length(chr.region))
{
print(cnt)
filterlist=list(chr.region[cnt],"protein_coding")
results=getBM(attributes = c("hgnc_symbol","ensembl_gene_id","entrezgene", "chromosome_name", "start_position", "end_position"),
filters = c("chromosomal_region","biotype"), values = filterlist, mart = ensembl54)
all.results=rbind(all.results,results)
}
BioMart does support hg18 (NCBI36). This functionality can be found by going to the e54 archive site, which Ensembl plans to maintain for at least another year and a half. Find it here:
http://may2009.archive.ensembl.org/biomart/martview
If you know Perl, you can also access the archive through the Perl API:
http://may2009.archive.ensembl.org/info/data/api.html
I hope this helps.
In python with cruzdb (available here: https://pypi.python.org/pypi/cruzdb):
from cruzdb import Genome
hg18 = Genome('hg18')
hg18.bin_query('refGene', 'chr1', '123456', '223456')
In today's (2018) cruzdb
it seems to require a bit more work than that since one has to iterate/fetchall through the sql query (also give the intervals with integers instead of str
):
In [9]: from cruzdb import Genome
...: hg18 = Genome('hg18')
...: hg18.bin_query('refGene', 'chr1', 123456, 223456)
...:
Out[9]: <sqlalchemy.orm.query.Query at 0x10b0297d0>
In [10]: from cruzdb import Genome
...: hg18 = Genome('hg18')
...: q = hg18.bin_query('refGene', 'chr1', 123456, 223456)
...:
...:
In [11]: q.statement.execute().fetchall()
Out[11]: [(585, 'NR_039983', 'chr1', '-', 124635L, 130429L, 130429L, 130429L, 3L, '124635,129652,129937,', '129559,129710,130429,', 0L, 'LOC729737', u'unk', u'unk', '-1,-1,-1,')]
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
duplicate of
Find out the genes that correspond to my coordinates
how can i get the gene symbols for the regions. Do you know the syntax for that.
More of a partial duplicate. They also want to know how to access hg18 via Biomart. Judging by the archive page (earliest is v54, May 2009), this is not possible.
Archive 54 is the NCBI36 build (aka hg18) so it should work fine.