Question

Non-Unique Refseq Ids ?

2

Entering edit mode

13.7 years ago

Luqman ▴ 30

Hi,

Does anyone have a problem with refseq ids seemingly not identifying genes uniquely?

I counted 400+ of them existing in current Refseq versions. Some examples are: NM001080141, NM001080146, NM001080137, NM001080138.

In UCSC, these are the coordinates for NM001080141:
NM001080141 at chrX:120077416-120080733
NM001080141 at chrX:120082277-120085594
NM001080141 at chrX:120096881-120100198
NM001080141 at chrX:120092020-120095337
NM001080141 at chrX:120116321-120119638
NM001080141 at chrX:120067695-120071012
NM001080141 at chrX:120072556-120075873
NM001080141 at chrX:120101741-120105058
NM001080141 at chrX:120106601-120109918
NM_001080141 at chrX:120111461-120114778

Which gene database should I use if I want unique ids for every gene and isoform?

Thanks!

refseq identifiers gene • 3.6k views

ADD COMMENT • link updated 13.3 years ago by David Quigley 11k • written 13.7 years ago by Luqman ▴ 30

score 2 · Answer 1 · 2011-10-20

2

Entering edit mode

13.7 years ago

Larry_Parnell 16k

That all of the above examples map to the X chromosome is not a concern to me. There are several segments of X that are duplicated - this is part of the biology of a single X in males. In fact, there is a segment of Y that matches at nearly 100% sequence identity to a segment of X.

ADD COMMENT • link 13.7 years ago by Larry_Parnell 16k

score 1 · Answer 2 · 2011-10-20

1

Entering edit mode

13.7 years ago

Pierre Lindenbaum 166k

Use the UCSC knownGene database where one identifier=one genomic position.

http://bioinformatics.oxfordjournals.org/content/22/9/1036.full

or Ensembl genes: http://genome.cshlp.org/content/14/5/942.abstract

ADD COMMENT • link 13.7 years ago by Pierre Lindenbaum 166k

0

Entering edit mode

it is useful that UCSC provides a N:1 mapping to entrez gene ids

ADD REPLY • link 13.7 years ago by Marcin Cieslik ▴ 520

0

Entering edit mode

@Marcin yes, it is the table kgXRef (see http://bioinformatics.oxfordjournals.org/content/22/9/1036.full )

ADD REPLY • link 13.7 years ago by Pierre Lindenbaum 166k

score 1 · Answer 3 · 2012-03-22

If you grab the sequence for NM_001080141.1 (http://www.ncbi.nlm.nih.gov/nuccore/121949793?report=fasta) and then BLAT it against the human genome, you'll find numerous perfect matches. If you only picked one locus for this sequence, it would be an arbitrary choice. UCSC knownGene picks a single entry from the list (120092019-120095337) but it's not obvious to me why this particular locus was picked. While this is annoying, it's the biology of the sequence.