I found out that the UCSC refseq transcript ids do not have a version number.. below are all the "same" transcript (from UCSC refGene table). The positions and exon counts are wildly different. According to NCBI there are seven versions for this particular one: http://www.ncbi.nlm.nih.gov/nuccore/NM_000500
id | strand | start | end | coding_start | coding_end | coding_start_status | coding_end_status | exon_count | refseq_id | gene_id
-------+--------+----------+----------+--------------+------------+---------------------+-------------------+------------+-----------+---------
2438 | + | 3355551 | 3356557 | 3355551 | 3356021 | incmpl | cmpl | 2 | NM_000500 | 5448
2439 | + | 3385938 | 3389289 | 3386045 | 3388753 | cmpl | cmpl | 10 | NM_000500 | 5448
2440 | + | 3267147 | 3270502 | 3267254 | 3269966 | cmpl | cmpl | 10 | NM_000500 | 5448
2441 | + | 3306069 | 3342156 | 3306176 | 3341620 | cmpl | cmpl | 10 | NM_000500 | 5448
2442 | + | 3306853 | 3309423 | 3306853 | 3308887 | incmpl | cmpl | 9 | NM_000500 | 5448
16023 | + | 31973359 | 31976713 | 31973466 | 31976177 | cmpl | cmpl | 11 | NM_000500 | 5448
16024 | + | 3476744 | 3480099 | 3476851 | 3479563 | cmpl | cmpl | 10 | NM_000500 | 5448
16025 | + | 3258942 | 3262296 | 3259049 | 3261760 | cmpl | cmpl | 11 | NM_000500 | 5448
16026 | + | 3285309 | 3288664 | 3285416 | 3288128 | cmpl | cmpl | 10 | NM_000500 | 5448
16061 | + | 32006093 | 32009448 | 32006200 | 32008912 | cmpl | cmpl | 10 | NM_000500 | 5448
Does UCSC keep track of the RefSeq versions for the refGene table?
For clarity/completeness, can you please explain how you obtained this result from the UCSC refGene table
We loaded the data into our own defined table with internal surrogate keys.