Find Out The Genes That Correspond To My Coordinates
5
Dear All,
I have the following coordinates
1 chr1 [ 9933699, 9934385] |
2 chr1 [ 88255056, 88257357] |
How can I find out what genes are located next or in the aforementioned coordinates? I would like to get a refseq name and not the ensemble names such as ENSMUSG00000093178 or NM_00234
Could you please give me a guideline for that?
Thank you in advance
Best regards
Lena
chip-seq
exon
intron
peak-calling
• 20k views
•
link
updated 12.5 years ago by
Ian
6.1k
•
written 12.5 years ago by
e.karasmani
▴
140
Using the mysql server of the UCSC:
$ mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19 -e '
select distinct
name,
chrom,
txStart,
txEnd,
IF(NOT(txEnd < 9933699 OR txStart > 9934385), 0, IF(txStart < 9934385,txStart-9934385,9933699-txEnd)) as distance
from refGene where chrom="chr1" order by distance limit 20'
+--------------+-------+----------+----------+----------+
| name | chrom | txStart | txEnd | distance |
+--------------+-------+----------+----------+----------+
| NM_001012329 | chr1 | 9908333 | 9970316 | 0 |
| NM_020248 | chr1 | 9908333 | 9970316 | 0 |
| NM_001009566 | chr1 | 9789078 | 9884550 | 49149 |
| NM_014944 | chr1 | 9789078 | 9884550 | 49149 |
| NM_032368 | chr1 | 9989775 | 10002826 | 55390 |
| NM_022787 | chr1 | 10003485 | 10045556 | 69100 |
| NM_052960 | chr1 | 10057254 | 10076078 | 122869 |
| NM_005026 | chr1 | 9711789 | 9789172 | 144527 |
| NM_001105562 | chr1 | 10093040 | 10241296 | 158655 |
| NM_006048 | chr1 | 10093040 | 10241296 | 158655 |
| NR_027045 | chr1 | 9712667 | 9714644 | 219055 |
| NM_001130924 | chr1 | 9648931 | 9674935 | 258764 |
| NM_001010866 | chr1 | 9648931 | 9665020 | 268679 |
| NM_032315 | chr1 | 9599527 | 9642831 | 290868 |
| NM_015074 | chr1 | 10270763 | 10441661 | 336378 |
| NM_183416 | chr1 | 10270763 | 10368655 | 336378 |
| NM_025106 | chr1 | 9352940 | 9429590 | 504109 |
| NM_002631 | chr1 | 10459084 | 10480201 | 524699 |
| NM_198544 | chr1 | 10490158 | 10512060 | 555773 |
| NM_199006 | chr1 | 10490158 | 10512060 | 555773 |
+--------------+-------+----------+----------+----------+
Use bedtools . Download refseq genes from UCSC. Then use bedtools . Have a look at closestBed
and intersectBed
.
EDIT: Firstly you have to make your input file (chr, coordinates) in bed file.
A simple Table Browser search of these regions do the trick, unless you need something more robust and for larger sets of data (NM_ is the refseq as mentioned above)?
choose species and assembly
choose genes and gene prediction
choose refseq and ref gene
define regions above
output format: selected fields (choose at minimum gene name and alternative)
Gives a table delimited text file of gene names. For example, region above chr1:9933699-9934385 (assuming human, hg19) gives (cleaned for display purposes):
name chrom txStart txEnd name2
NM_020248 chr1 9908333 9970316 CTNNBIP1
NM_001012329 chr1 9908333 9970316 CTNNBIP1
You could use related tables to pull out other IDs and GO terms, etc.
•
link
updated 5.3 years ago by
Ram
44k
•
written 12.5 years ago by
Treylathe
▴
950
NM_002341 is a RefSeq accession number.
If you want to get a gene official name rather than an accession number, then (assuming these coordinates are on Homo sapiens), you could have a look at this .
A R specific method is the Bioconductor package ChIPpeakAnno .
Login before adding your answer.
Traffic: 1926 users visited in the last hour
thank you very much!
however is there a way by using R (since everything that I am doing is in R)....
i have my coordinates in IRanges or a data.frame (if this can help you)
thank you in advance
best regards Lena