From where to get a comprehensive list of genes with gene start, gene end and chromosome for build 37?
2
0
Entering edit mode
2.9 years ago
Star ▴ 60

Hi all,

I am trying to annotate list of genes with gene start, gene end (build37) and chromosome. I mapped most of the genes from a list downloaded from Biomart/UCSC, but still have 25 genes those are missing from the list. For example PRAG1, CCL4L2 etc etc. I found one link containing these genes http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/refGene.txt.gz but when I tried to map b37 positions of some random genes (e.g., , 93854920 - 93954309), it does not map with coordinates as in UCSC. Is there any work around? My goal is simple but it turns out to be more complicated than I expected :(

Any leads would be much appreciated.

genome build37 genes R • 1.0k views
ADD COMMENT
2
Entering edit mode
2.9 years ago

You could get genes from Gencode: https://www.gencodegenes.org/human/release_39lift37.html

Then convert them from GFF to BED, pulling out the desired gene name:

$ wget -qO- https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_39/GRCh37_mapping/gencode.v39lift37.annotation.gff3.gz | gunzip -c | awk '($3=="gene")' | convert2bed -i gff --attribute-key="gene_name" - > genes.bed

Passing --attribute-key="gene_name" to convert2bed -i gff will retrieve the HGNC symbol from the GFF file (PRAG1, CCL4L2, etc.) where available, and place this in the ID field of the resulting BED file. If the HGNC symbol is not available, the Ensembl gene ID will be used, instead.

Ref. https://bedops.readthedocs.io/en/latest/content/reference/file-management/conversion/convert2bed.html

ADD COMMENT
0
Entering edit mode
2.9 years ago
Papyrus ★ 3.0k

Also, because you mention R, after retrieving gene information from a source like Alex's example, you could do:

library(GenomicFeatures)
txdb <- makeTxDbFromGFF(file = "gencode.v39lift37.annotation.gff3.gz")
genes <- genes(txdb)
ADD COMMENT

Login before adding your answer.

Traffic: 1614 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6