I Have An Excel File Including Some Regions And I Need The Genes Located In Each Region As Out Put
3
2
Entering edit mode
13.2 years ago
Omid ▴ 590

I have an Excel file including positions like below (based on hg 18) chr1:195005320-195066067

chr16:16462059-16755404

chr17:41527505-41719986

chr3:176568786-177212247

chr15:41676019-41735779

chr1:195011143-195066066

chr6:32595201-32633890

... I want to known which genes are located in each region in a seprate column in front of each region.Thanks

gene • 2.7k views
ADD COMMENT
2
Entering edit mode
13.2 years ago

Biomart will retrieve the genes that reside in these regions just follow this link:

http://www.biomart.org/biomart/martview/0cdc89146700420ccb5d1ccf961c219f

Use the filters to specify the chromosomal coordinates and the attributes to generate the coordinates and genes within those regions. This can then be exported as a csv file directly into excel.

ADD COMMENT
1
Entering edit mode

Or you may first map your NCBI36 (hg18) regions to GRCh37 (hg19) using the Ensembl Assembly mapper (http://www.ensembl.org/tools.html) and then use BioMart for the most recent version of Ensembl to retrivee your genes.

ADD REPLY
0
Entering edit mode

warning: your link is directed to GRCh37 not hg18.

ADD REPLY
0
Entering edit mode

thanks but in biomart there is not hg 18.That is hg 19.

ADD REPLY
0
Entering edit mode

it is hg18 if you trace the Ensembl archive back to v54. you may then build your query based on hg18, although the genes in there will be from v54 too.

ADD REPLY
2
Entering edit mode
13.2 years ago

google for your query returns the following answer on the ucsc mailing list for the table browser: http://genome.ucsc.edu/cgi-bin/hgTables?command=start

To input a batch of regions and output a list of gene names go to the 
table browser and after selecting your assembly of interest select:

     group: Genes and Gene Prediction Tracks
     track: UCSC genes
     table: knownGene

You can paste in regions by clicking on "define regions".

Then select:

     output format: "selected fields from primary and related tables"

Once you click on "get output" this will take you to a menu where you
can specify which fields you want to retrieve from associated tables. 
Now in the "...kgXref fields" section you can select "geneSymbol" or 
other indentifiers of interest to be included in the output. Then click 
"get output".
ADD COMMENT
0
Entering edit mode

I really appreciate Pierre, The problem is: when I select geneSymbol, in out put file I will get all names of one gene and that would be confusing specially when the region is long and there are several genes in that region.Do you know any solution for it?

ADD REPLY
2
Entering edit mode
13.2 years ago
Boboppie ▴ 550

You can also try genomic region search on metabolicMine - http://www.metabolicmine.org/beta/genomicRegionSearch.do

ADD COMMENT

Login before adding your answer.

Traffic: 2365 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6