I was pointed to this answer some time ago, so inspired by govardhank's answer, I used UCSC hgTables to download gene summaries, indexed by RefSeq mRNA. There is a special table with summaries: hgFixed.refSeqSummary
.
The gbff files (for Homo Sapiens) parsed with script from weslfield's comment gave 6 661 unique summaries. The UCSC table returned 26 140 unique summaries although these included mouse genes too (and possibly others)*; After mapping the summaries to subset of human mRNAs which I am currently working with I got 12 574 unique summaries, which doubles the gbff parsing coverage.
Also UCSC returns data for QPCT, REV1, NEB (not sure about NICK10, I couldn't find such gene) mentioned by Dave Curtis as missing in gbff files.
Feel free to use my gist for UCSC table retrieval: ucsc_download.sh. Here is the how to use it:
source ucsc_download.sh
get_whole_genome_table summary.tsv.gz genes refGene hgFixed.refSeqSummary gzip
Summary: As for 2017 please use UCSC tables, those are more complete, easier to fetch and parse. They did a really good job at making those tables.
(*) I am not sure why, but I know that it is not script-specific - I got the same when using the web interface; any advice how to avoid this would be appreciated.
Thanks David. In case any one needs it , mapping NG_ ids can be done using this file ftp://ftp.ncbi.nih.gov/refseq/H_sapiens/RefSeqGene/LRG_RefSeqGene