Is There An Easy Way Of Getting Gene Symbols From Genomic Coordinates?
6
1
Entering edit mode
11.4 years ago

i have genomic coordinates from hg18 build and want to get the gene symbols. I have tried biomart and it only has the hg19 option. Is there any other quick and easy way? I have ~8000 ranges across all the chromosomes.

gene coordinates • 14k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
0
Entering edit mode

how can i get the gene symbols for the regions. Do you know the syntax for that.

ADD REPLY
0
Entering edit mode

More of a partial duplicate. They also want to know how to access hg18 via Biomart. Judging by the archive page (earliest is v54, May 2009), this is not possible.

ADD REPLY
0
Entering edit mode

Archive 54 is the NCBI36 build (aka hg18) so it should work fine.

ADD REPLY
4
Entering edit mode
11.4 years ago

Here's one way to do it with the UCSC Genome Browser, I think.

Assuming a bash shell, define some parameters:

$ CHR="chr1"
$ START=11000000
$ STOP=12000000

To get the first 10 gene symbols for hg18 within this range:

$ mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -N -e \
    "SELECT kg.chrom, kg.txStart, kg.txEnd, x.geneSymbol \
        FROM knownGene kg, kgXref x \
        WHERE kg.chrom LIKE '${CHR}' AND kg.txStart >= ${START} AND kg.txEnd < ${STOP} \
        GROUP BY(x.geneSymbol) \
        LIMIT 10;" hg18

+------+----------+----------+-----------+
| chr1 | 11009166 | 11029872 | 16G2      |
| chr1 | 11009166 | 11029872 | 214K23.2  |
| chr1 | 11009166 | 11029872 | 44050     |
| chr1 | 11009166 | 11029872 | 5-OPase   |
| chr1 | 11009166 | 11029872 | 6a8b      |
| chr1 | 11009166 | 11029872 | A121/SUI1 |
| chr1 | 11009166 | 11029872 | A18hnRNP  |
| chr1 | 11009166 | 11029872 | A1BG      |
| chr1 | 11009166 | 11029872 | A1CF      |
| chr1 | 11009166 | 11029872 | A26B1     |
+------+----------+----------+-----------+
ADD COMMENT
1
Entering edit mode
11.1 years ago
B. Arman Aksoy ★ 1.2k

If you are an R-person, I suggest trying Bioconductor's CNTools:

http://www.bioconductor.org/packages/release/bioc/html/CNTools.html

It is pretty convenient and the how-to document on the page above really helps in terms of getting started.

ADD COMMENT
1
Entering edit mode
11.1 years ago
User 1933 ▴ 360

if you list your region like

1:9330001:9395000
1:149242001:149250000
1:171936001:171971000
1:174059001:174143000
1:219914001:227775000

you can use the following code to get the corresponding gene symbols.

rm(list = ls())
your_region = read.table("yourtable.csv")

library("biomaRt")
ensembl = useMart("ensembl",dataset="hsapiens_gene_ensembl")

ensembl54=useMart("ENSEMBL_MART_ENSEMBL", host="jan2013.archive.ensembl.org/biomart/martservice/", dataset="hsapiens_gene_ensembl")

chr.region = as.matrix(your_region$V1)

entrez.ids=vector() 
entrez.count=vector()
all.results=data.frame() 

for (cnt in 1:length(chr.region))
{
    print(cnt)
    filterlist=list(chr.region[cnt],"protein_coding")
    results=getBM(attributes = c("hgnc_symbol","ensembl_gene_id","entrezgene", "chromosome_name", "start_position", "end_position"), 

    filters = c("chromosomal_region","biotype"), values = filterlist, mart = ensembl54)
      all.results=rbind(all.results,results)

}
ADD COMMENT
1
Entering edit mode
11.0 years ago

BioMart does support hg18 (NCBI36). This functionality can be found by going to the e54 archive site, which Ensembl plans to maintain for at least another year and a half. Find it here:

http://may2009.archive.ensembl.org/biomart/martview

If you know Perl, you can also access the archive through the Perl API:

http://may2009.archive.ensembl.org/info/data/api.html

I hope this helps.

ADD COMMENT
1
Entering edit mode
11.0 years ago
brentp 24k

In python with cruzdb (available here: https://pypi.python.org/pypi/cruzdb):

from cruzdb import Genome
hg18 = Genome('hg18')
hg18.bin_query('refGene', 'chr1', '123456', '223456')
ADD COMMENT
0
Entering edit mode
6.7 years ago

In today's (2018) cruzdb it seems to require a bit more work than that since one has to iterate/fetchall through the sql query (also give the intervals with integers instead of str):

In [9]: from cruzdb import Genome
 ...: hg18 = Genome('hg18')
 ...: hg18.bin_query('refGene', 'chr1', 123456, 223456)
 ...:
Out[9]: <sqlalchemy.orm.query.Query at 0x10b0297d0>

In [10]: from cruzdb import Genome
...: hg18 = Genome('hg18')
...: q = hg18.bin_query('refGene', 'chr1', 123456, 223456)
...:
...:

In [11]: q.statement.execute().fetchall()
Out[11]: [(585, 'NR_039983', 'chr1', '-', 124635L, 130429L, 130429L, 130429L, 3L, '124635,129652,129937,', '129559,129710,130429,', 0L, 'LOC729737', u'unk', u'unk', '-1,-1,-1,')]
ADD COMMENT

Login before adding your answer.

Traffic: 1717 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6