Question

Is There An Easy Way Of Getting Gene Symbols From Genomic Coordinates?

1

Entering edit mode

11.4 years ago

merajazizmeraj ▴ 20

i have genomic coordinates from hg18 build and want to get the gene symbols. I have tried biomart and it only has the hg19 option. Is there any other quick and easy way? I have ~8000 ranges across all the chromosomes.

gene coordinates • 14k views

ADD COMMENT • link updated 6.6 years ago by Roman Valls Guimerà ▴ 620 • written 11.4 years ago by merajazizmeraj ▴ 20

0

Entering edit mode

duplicate of

Find out the genes that correspond to my coordinates

ADD REPLY • link 11.4 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

how can i get the gene symbols for the regions. Do you know the syntax for that.

ADD REPLY • link 11.4 years ago by merajazizmeraj ▴ 20

0

Entering edit mode

More of a partial duplicate. They also want to know how to access hg18 via Biomart. Judging by the archive page (earliest is v54, May 2009), this is not possible.

ADD REPLY • link 11.4 years ago by Neilfws 49k

0

Entering edit mode

Archive 54 is the NCBI36 build (aka hg18) so it should work fine.

ADD REPLY • link 11.0 years ago by Emily 24k

score 4 · Answer 1 · 2013-07-18

Here's one way to do it with the UCSC Genome Browser, I think.

Assuming a bash shell, define some parameters:

$ CHR="chr1"
$ START=11000000
$ STOP=12000000

To get the first 10 gene symbols for hg18 within this range:

$ mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -N -e \
    "SELECT kg.chrom, kg.txStart, kg.txEnd, x.geneSymbol \
        FROM knownGene kg, kgXref x \
        WHERE kg.chrom LIKE '${CHR}' AND kg.txStart >= ${START} AND kg.txEnd < ${STOP} \
        GROUP BY(x.geneSymbol) \
        LIMIT 10;" hg18

+------+----------+----------+-----------+
| chr1 | 11009166 | 11029872 | 16G2      |
| chr1 | 11009166 | 11029872 | 214K23.2  |
| chr1 | 11009166 | 11029872 | 44050     |
| chr1 | 11009166 | 11029872 | 5-OPase   |
| chr1 | 11009166 | 11029872 | 6a8b      |
| chr1 | 11009166 | 11029872 | A121/SUI1 |
| chr1 | 11009166 | 11029872 | A18hnRNP  |
| chr1 | 11009166 | 11029872 | A1BG      |
| chr1 | 11009166 | 11029872 | A1CF      |
| chr1 | 11009166 | 11029872 | A26B1     |
+------+----------+----------+-----------+

score 1 · Answer 2 · 2013-12-03

1

Entering edit mode

11.0 years ago

B. Arman Aksoy ★ 1.2k

If you are an R-person, I suggest trying Bioconductor's CNTools:

http://www.bioconductor.org/packages/release/bioc/html/CNTools.html

It is pretty convenient and the how-to document on the page above really helps in terms of getting started.

ADD COMMENT • link 11.0 years ago by B. Arman Aksoy ★ 1.2k

score 1 · Answer 3 · 2013-12-03

if you list your region like

1:9330001:9395000
1:149242001:149250000
1:171936001:171971000
1:174059001:174143000
1:219914001:227775000

you can use the following code to get the corresponding gene symbols.

rm(list = ls())
your_region = read.table("yourtable.csv")

library("biomaRt")
ensembl = useMart("ensembl",dataset="hsapiens_gene_ensembl")

ensembl54=useMart("ENSEMBL_MART_ENSEMBL", host="jan2013.archive.ensembl.org/biomart/martservice/", dataset="hsapiens_gene_ensembl")

chr.region = as.matrix(your_region$V1)

entrez.ids=vector() 
entrez.count=vector()
all.results=data.frame() 

for (cnt in 1:length(chr.region))
{
    print(cnt)
    filterlist=list(chr.region[cnt],"protein_coding")
    results=getBM(attributes = c("hgnc_symbol","ensembl_gene_id","entrezgene", "chromosome_name", "start_position", "end_position"), 

    filters = c("chromosomal_region","biotype"), values = filterlist, mart = ensembl54)
      all.results=rbind(all.results,results)

}

score 1 · Answer 4 · 2014-01-02

BioMart does support hg18 (NCBI36). This functionality can be found by going to the e54 archive site, which Ensembl plans to maintain for at least another year and a half. Find it here:

http://may2009.archive.ensembl.org/biomart/martview

If you know Perl, you can also access the archive through the Perl API:

http://may2009.archive.ensembl.org/info/data/api.html

I hope this helps.

score 1 · Answer 5 · 2014-01-02

1

Entering edit mode

10.9 years ago

brentp 24k

In python with cruzdb (available here: https://pypi.python.org/pypi/cruzdb):

from cruzdb import Genome
hg18 = Genome('hg18')
hg18.bin_query('refGene', 'chr1', '123456', '223456')

ADD COMMENT • link 10.9 years ago by brentp 24k

score 0 · Answer 6 · 2018-04-04

In today's (2018) cruzdb it seems to require a bit more work than that since one has to iterate/fetchall through the sql query (also give the intervals with integers instead of str):

In [9]: from cruzdb import Genome
 ...: hg18 = Genome('hg18')
 ...: hg18.bin_query('refGene', 'chr1', 123456, 223456)
 ...:
Out[9]: <sqlalchemy.orm.query.Query at 0x10b0297d0>

In [10]: from cruzdb import Genome
...: hg18 = Genome('hg18')
...: q = hg18.bin_query('refGene', 'chr1', 123456, 223456)
...:
...:

In [11]: q.statement.execute().fetchall()
Out[11]: [(585, 'NR_039983', 'chr1', '-', 124635L, 130429L, 130429L, 130429L, 3L, '124635,129652,129937,', '129559,129710,130429,', 0L, 'LOC729737', u'unk', u'unk', '-1,-1,-1,')]