Does anyone have a problem with refseq ids seemingly not identifying genes uniquely?
I counted 400+ of them existing in current Refseq versions. Some examples are: NM001080141, NM001080146, NM001080137, NM001080138.
In UCSC, these are the coordinates for NM001080141:
NM001080141 at chrX:120077416-120080733
NM001080141 at chrX:120082277-120085594
NM001080141 at chrX:120096881-120100198
NM001080141 at chrX:120092020-120095337
NM001080141 at chrX:120116321-120119638
NM001080141 at chrX:120067695-120071012
NM001080141 at chrX:120072556-120075873
NM001080141 at chrX:120101741-120105058
NM001080141 at chrX:120106601-120109918
NM_001080141 at chrX:120111461-120114778
Which gene database should I use if I want unique ids for every gene and isoform?
That all of the above examples map to the X chromosome is not a concern to me. There are several segments of X that are duplicated - this is part of the biology of a single X in males. In fact, there is a segment of Y that matches at nearly 100% sequence identity to a segment of X.
If you grab the sequence for NM_001080141.1 (http://www.ncbi.nlm.nih.gov/nuccore/121949793?report=fasta) and then BLAT it against the human genome, you'll find numerous perfect matches. If you only picked one locus for this sequence, it would be an arbitrary choice. UCSC knownGene picks a single entry from the list (120092019-120095337) but it's not obvious to me why this particular locus was picked. While this is annoying, it's the biology of the sequence.
it is useful that UCSC provides a N:1 mapping to entrez gene ids
@Marcin yes, it is the table kgXRef (see http://bioinformatics.oxfordjournals.org/content/22/9/1036.full )