Question

Where Can I Find Human Refseq That Come From The Same Transcription Locus?

1

Entering edit mode

14.0 years ago

nikulina ▴ 300

Good day!

Where can I find human refseq that come from the same transcription locus (gene) for hg18? So that refseq with overlapping coordinates are clustered into one gene? Is there any special database that can be downloaded?

Thank you in advance!

refseq • 3.6k views

ADD COMMENT • link updated 14.0 years ago by Andrew Su 4.9k • written 14.0 years ago by nikulina ▴ 300

Ram · Answer 1 · 2010-11-14

The refSeq are located in the table refGene of the UCSC mysql server. The field name2 can be used to get all the transcripts for the same gene.

mysql  -h  genome-mysql.cse.ucsc.edu -A -u genome -D hg18

mysql> select * from refGene as G where chrom="chr1" and txStart > 1000000 and txEnd < 2000000  limit 2\G
*************************** 1. row ***************************
         bin: 592
        name: NM_017891
       chrom: chr1
      strand: -
     txStart: 1007060
       txEnd: 1041599
    cdsStart: 1008135
      cdsEnd: 1016786
   exonCount: 10
  exonStarts: 1007060,1009595,1009723,1011120,1012381,1012744,1015595,1016714,1017233,1041302,
    exonEnds: 1008230,1009626,1009749,1011255,1012447,1012840,1015671,1016808,1017346,1041599,
          id: 0
       name2: C1orf159
cdsStartStat: cmpl
  cdsEndStat: cmpl
  exonFrames: 1,0,1,1,1,1,0,0,-1,-1,
*************************** 2. row ***************************
         bin: 593
        name: NR_029639
       chrom: chr1
      strand: +
     txStart: 1092346
       txEnd: 1092441
    cdsStart: 1092441
      cdsEnd: 1092441
   exonCount: 1
  exonStarts: 1092346,
    exonEnds: 1092441,
          id: 0
       name2: MIR200B
cdsStartStat: unk
  cdsEndStat: unk
  exonFrames: -1,
2 rows in set (0.20 sec)

It can be downloaded here

score 1 · Answer 2 · 2010-11-15

1

Entering edit mode

14.0 years ago

Andrew Su 4.9k

For a small number of genes, the most straightforward method to me seems to be just to check the Entrez Gene page (e.g., CDK2).

To do it programmatically, you can use the mysql method that Pierre describes. Or you can download the gene2refseq file from NCBI's ftp site. The columns are pretty self-explanatory but are nevertheless described in the README file.

ADD COMMENT • link 14.0 years ago by Andrew Su 4.9k

0

Entering edit mode

Thank you! Finally I used some perl programming to create merged intervals from the overlapping RefSeqs.

ADD REPLY • link 14.0 years ago by nikulina ▴ 300