Hello,
I was wondering what is the quickest way to get a listing of the human Gene Symbols from Refseq Build 37. Thannks in advance for your suggestions.
Fred
Hello,
I was wondering what is the quickest way to get a listing of the human Gene Symbols from Refseq Build 37. Thannks in advance for your suggestions.
Fred
curl -s "http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/refGene.txt.gz" |\
gunzip -c | cut -d ' ' -f 13 |\
sort -u
RefSeq and Gene work with HGNC to get correct gene nomenclature on the NCBI annotation. NCBI is now making GFF files for each annotation run (current run is annotation run 104). You can find the files here: ftp://ftp.ncbi.nlm.nih.gov/genomes/H_sapiens/GFF/
The name attribute on the 'gene' lines is the HGNC name, if one exists. If not, it will typically be a 'LOC' designator that is used as a placeholder until HGNC can name it.
when it comes to gene nomenclature I always trust the most the HUGO Gene Nomenclature Committee (HGNC), which provides an always up-to-date gene list here, although you may find more specific information at their downloads section.
but anyway, if I would have to look for a plain list of all current gene symbols I would go to to BioMart, select the latest gene database available (currently Ensembl Genes 69), not create any filter, and select only the "associated gene name" at the attributes section.
You can get this from UCSC table browser. Select genome version and RefSeq genes for the track. This will give you a table with RefSeq id and gene names.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Why not "sort -u" instead of "uniq | sort | uniq"? http://unixhelp.ed.ac.uk/CGI/man-cgi?sort
you're right !
obviously, a single command is much quicker than a few clicks on a web browser.
Thanks a lot Pierre. In the meantime I was looking in the ftp directory at NCBI without finding a nice tab delimited file that would fit my needs. Very sincerely. Fred