Quickest Way To Get Human Gene Symbols From Refseq Build 37
4
2
Entering edit mode
11.8 years ago

Hello,

I was wondering what is the quickest way to get a listing of the human Gene Symbols from Refseq Build 37. Thannks in advance for your suggestions.

Fred

gene refseq human • 7.6k views
ADD COMMENT
7
Entering edit mode
11.8 years ago
 curl -s "http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/refGene.txt.gz" |\
   gunzip -c | cut -d '        ' -f 13 |\
   sort -u
ADD COMMENT
3
Entering edit mode

Why not "sort -u" instead of "uniq | sort | uniq"? http://unixhelp.ed.ac.uk/CGI/man-cgi?sort

ADD REPLY
0
Entering edit mode

you're right !

ADD REPLY
0
Entering edit mode

obviously, a single command is much quicker than a few clicks on a web browser.

ADD REPLY
0
Entering edit mode

Thanks a lot Pierre. In the meantime I was looking in the ftp directory at NCBI without finding a nice tab delimited file that would fit my needs. Very sincerely. Fred

ADD REPLY
2
Entering edit mode
11.8 years ago
deanna.church ★ 1.1k

RefSeq and Gene work with HGNC to get correct gene nomenclature on the NCBI annotation. NCBI is now making GFF files for each annotation run (current run is annotation run 104). You can find the files here: ftp://ftp.ncbi.nlm.nih.gov/genomes/H_sapiens/GFF/

The name attribute on the 'gene' lines is the HGNC name, if one exists. If not, it will typically be a 'LOC' designator that is used as a placeholder until HGNC can name it.

ADD COMMENT
1
Entering edit mode
11.8 years ago

when it comes to gene nomenclature I always trust the most the HUGO Gene Nomenclature Committee (HGNC), which provides an always up-to-date gene list here, although you may find more specific information at their downloads section.

but anyway, if I would have to look for a plain list of all current gene symbols I would go to to BioMart, select the latest gene database available (currently Ensembl Genes 69), not create any filter, and select only the "associated gene name" at the attributes section.

ADD COMMENT
0
Entering edit mode
8.1 years ago

You can get this from UCSC table browser. Select genome version and RefSeq genes for the track. This will give you a table with RefSeq id and gene names.

ADD COMMENT

Login before adding your answer.

Traffic: 1524 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6