Why Does Ucsc Refseq Gene Annotation File Have More Than One Annotation For A Same Transcript (Nm Id)?
1
1
Entering edit mode
12.7 years ago

Transcript ids are not unique in hg19 refseq gene annotation file. I found many ids coming more than once with different start and end on the same chromosome and strand. Does it mean duplicate gene? Do duplicate genes have same id/name?

ucsc refseq transcript • 6.3k views
ADD COMMENT
2
Entering edit mode

can you give any example ?

ADD REPLY
2
Entering edit mode
12.7 years ago

As far as I can see, some refGenes have been also mapped on the "alternative haplotypes" chromosomes. See http://genome.ucsc.edu/FAQ/FAQdownloads#download10

$ mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19 -e '
     select * from refGene where name="NM_000593"  limit 10\G'
    *************************** 1. row ***************************
             bin: 835
            name: NM_000593
           chrom: chr6
          strand: -
         txStart: 32812985
           txEnd: 32821748
        cdsStart: 32813355
          cdsEnd: 32821593
       exonCount: 11
      exonStarts: 32812985,32814844,32815289,32815695,32816428,32816766,32818096,32818720,32819885,32820164,32820815,
        exonEnds: 32813562,32814981,32815452,32815869,32816617,32816895,32818294,32818926,32820016,32820279,32821748,
           score: 0
           name2: TAP1
    cdsStartStat: cmpl
      cdsEndStat: cmpl
      exonFrames: 0,1,0,0,0,0,0,1,2,1,0,
    *************************** 2. row ***************************
             bin: 616
            name: NM_000593
           chrom: chr6_apd_hap1
          strand: -
         txStart: 4099989
           txEnd: 4108752
        cdsStart: 4100359
          cdsEnd: 4108597
       exonCount: 11
      exonStarts: 4099989,4101848,4102293,4102699,4103432,4103770,4105098,4105722,4106889,4107168,4107819,
        exonEnds: 4100566,4101985,4102456,4102873,4103621,4103899,4105296,4105928,4107020,4107283,4108752,
           score: 0
           name2: TAP1
    cdsStartStat: cmpl
      cdsEndStat: cmpl
      exonFrames: 0,1,0,0,0,0,0,1,2,1,0,
    *************************** 3. row ***************************
             bin: 617
            name: NM_000593
           chrom: chr6_cox_hap2
          strand: -
         txStart: 4257513
           txEnd: 4266276
        cdsStart: 4257883
          cdsEnd: 4266121
       exonCount: 11
      exonStarts: 4257513,4259372,4259817,4260223,4260956,4261294,4262624,4263248,4264413,4264692,4265343,
        exonEnds: 4258090,4259509,4259980,4260397,4261145,4261423,4262822,4263454,4264544,4264807,4266276,
           score: 0
           name2: TAP1
    cdsStartStat: cmpl
      cdsEndStat: cmpl
      exonFrames: 0,1,0,0,0,0,0,1,2,1,0,
    *************************** 4. row ***************************
             bin: 616
            name: NM_000593
           chrom: chr6_dbb_hap3
          strand: -
         txStart: 4094363
           txEnd: 4103126
        cdsStart: 4094733
          cdsEnd: 4102971
       exonCount: 11
      exonStarts: 4094363,4096222,4096667,4097073,4097806,4098144,4099472,4100096,4101263,4101542,4102193,
        exonEnds: 4094940,4096359,4096830,4097247,4097995,4098273,4099670,4100302,4101394,4101657,4103126,
           score: 0
           name2: TAP1
    cdsStartStat: cmpl
      cdsEndStat: cmpl
      exonFrames: 0,1,0,0,0,0,0,1,2,1,0,
    *************************** 5. row ***************************
             bin: 617
            name: NM_000593
           chrom: chr6_ssto_hap7
          strand: -
         txStart: 4243758
           txEnd: 4252521
        cdsStart: 4244128
          cdsEnd: 4252366
       exonCount: 11
      exonStarts: 4243758,4245617,4246062,4246468,4247201,4247539,4248867,4249491,4250658,4250937,4251588,
        exonEnds: 4244335,4245754,4246225,4246642,4247390,4247668,4249065,4249697,4250789,4251052,4252521,
           score: 0
           name2: TAP1
    cdsStartStat: cmpl
      cdsEndStat: cmpl
      exonFrames: 0,1,0,0,0,0,0,1,2,1,0,
    *************************** 6. row ***************************
             bin: 617
            name: NM_000593
           chrom: chr6_mann_hap4
          strand: -
         txStart: 4270181
           txEnd: 4278944
        cdsStart: 4270551
          cdsEnd: 4278789
       exonCount: 11
      exonStarts: 4270181,4272040,4272485,4272891,4273624,4273962,4275292,4275916,4277081,4277360,4278011,
        exonEnds: 4270758,4272177,4272648,4273065,4273813,4274091,4275490,4276122,4277212,4277475,4278944,
           score: 0
           name2: TAP1
    cdsStartStat: cmpl
      cdsEndStat: cmpl
      exonFrames: 0,1,0,0,0,0,0,1,2,1,0,
    *************************** 7. row ***************************
             bin: 616
            name: NM_000593
           chrom: chr6_mcf_hap5
          strand: -
         txStart: 4149862
           txEnd: 4158625
        cdsStart: 4150232
          cdsEnd: 4158470
       exonCount: 11
      exonStarts: 4149862,4151721,4152166,4152572,4153305,4153643,4154971,4155595,4156762,4157041,4157692,
        exonEnds: 4150439,4151858,4152329,4152746,4153494,4153772,4155169,4155801,4156893,4157156,4158625,
           score: 0
           name2: TAP1
    cdsStartStat: cmpl
      cdsEndStat: cmpl
      exonFrames: 0,1,0,0,0,0,0,1,2,1,0,
    *************************** 8. row ***************************
             bin: 615
            name: NM_000593
           chrom: chr6_qbl_hap6
          strand: -
         txStart: 4045092
           txEnd: 4053855
        cdsStart: 4045462
          cdsEnd: 4053700
       exonCount: 11
      exonStarts: 4045092,4046951,4047396,4047802,4048535,4048873,4050203,4050827,4051992,4052271,4052922,
        exonEnds: 4045669,4047088,4047559,4047976,4048724,4049002,4050401,4051033,4052123,4052386,4053855,
           score: 0
           name2: TAP1
    cdsStartStat: cmpl
      cdsEndStat: cmpl
      exonFrames: 0,1,0,0,0,0,0,1,2,1,0,

Edit: your NM_012151 has been mapped at multiple locations on chrX because its position is ambiguous: the chrX is full of segmental duplications.

ADD COMMENT
1
Entering edit mode

why don't u post an example?

ADD REPLY
0
Entering edit mode

There are some records on normal chromosome as well othe than these haplotype records.

ADD REPLY
0
Entering edit mode

I have posted examples, I don't know if you are not able to see them.

1760    NM_012151       chrX    +       154114634       154116336       154114649       154115765       1       154114634,      154116336,      0       F8A1    cmpl    cmpl    0,
1764    NM_012151       chrX    +       154611748       154613450       154611763       154612879       1       154611748,      154613450,      0       F8A1    cmpl    cmpl    0,
1765    NM_012151       chrX    -       154686574       154688276       154687145       154688261       1       154686574,      154688276,      0       F8A1    cmpl    cmp
ADD REPLY
0
Entering edit mode

is it reasonable to filter these haplotype records, when we deal with the genome sequence and annotation file?

ADD REPLY

Login before adding your answer.

Traffic: 2572 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6