Find gene regions (START and END) using gene IDs
1
0
Entering edit mode
17 months ago
Mahan ▴ 70

I have a list of gene IDs. I would like to know if there is a way to find the gene regions (START-END) on GRCh37 build? TIA

gene GRCh37 • 735 views
ADD COMMENT
0
Entering edit mode

What have you tried?

ADD REPLY
2
Entering edit mode
17 months ago
GenoMax 147k

A hacky answer. Find genes you need from the list.

Get GRCh37 GTF file here: https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_43/GRCh37_mapping/gencode.v43lift37.basic.annotation.gtf.gz

$ zcat gencode.v43lift37.basic.annotation.gtf.gz | awk -F "\t|;" '{OFS="\t"}{if ($3 == "gene") print $11,$4,$5}' | sed -e 's/gene_name//' -e 's/"//g' > genes_37
$ head genes_37
  DDX11L1       12010   13670
  WASH7P        14404   29570
  MIR1302-2HG   29554   31109
  FAM138A       34554   36081
  OR4G4P        52473   53312

Your genes of interest in a file called id.

$ more id
FAM138A
OR4G4P

Grab the start-stop from genes_37 file.

$ grep -f id -w genes_37 
  FAM138A       34554   36081
  OR4G4P        52473   53312
ADD COMMENT
0
Entering edit mode

Thank you very much! Thats really helpful.

ADD REPLY

Login before adding your answer.

Traffic: 1648 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6