Where Can I Find Human Refseq That Come From The Same Transcription Locus?
2
1
Entering edit mode
14.1 years ago
nikulina ▴ 300

Good day!

Where can I find human refseq that come from the same transcription locus (gene) for hg18? So that refseq with overlapping coordinates are clustered into one gene? Is there any special database that can be downloaded?

Thank you in advance!

refseq • 3.6k views
ADD COMMENT
2
Entering edit mode
14.1 years ago

The refSeq are located in the table refGene of the UCSC mysql server. The field name2 can be used to get all the transcripts for the same gene.

mysql  -h  genome-mysql.cse.ucsc.edu -A -u genome -D hg18

mysql> select * from refGene as G where chrom="chr1" and txStart > 1000000 and txEnd < 2000000  limit 2\G
*************************** 1. row ***************************
         bin: 592
        name: NM_017891
       chrom: chr1
      strand: -
     txStart: 1007060
       txEnd: 1041599
    cdsStart: 1008135
      cdsEnd: 1016786
   exonCount: 10
  exonStarts: 1007060,1009595,1009723,1011120,1012381,1012744,1015595,1016714,1017233,1041302,
    exonEnds: 1008230,1009626,1009749,1011255,1012447,1012840,1015671,1016808,1017346,1041599,
          id: 0
       name2: C1orf159
cdsStartStat: cmpl
  cdsEndStat: cmpl
  exonFrames: 1,0,1,1,1,1,0,0,-1,-1,
*************************** 2. row ***************************
         bin: 593
        name: NR_029639
       chrom: chr1
      strand: +
     txStart: 1092346
       txEnd: 1092441
    cdsStart: 1092441
      cdsEnd: 1092441
   exonCount: 1
  exonStarts: 1092346,
    exonEnds: 1092441,
          id: 0
       name2: MIR200B
cdsStartStat: unk
  cdsEndStat: unk
  exonFrames: -1,
2 rows in set (0.20 sec)

It can be downloaded here

ADD COMMENT
1
Entering edit mode

The file suggested by Pierre could be uploaded to GALAXY. A second file containing your gene/loci of interest can be used to pull out 'join' all the corresponding RefSeq transcripts using 'Join, Subtract and Group' > 'Join two Queries'

ADD REPLY
1
Entering edit mode
14.1 years ago
Andrew Su 4.9k

For a small number of genes, the most straightforward method to me seems to be just to check the Entrez Gene page (e.g., CDK2).

To do it programmatically, you can use the mysql method that Pierre describes. Or you can download the gene2refseq file from NCBI's ftp site. The columns are pretty self-explanatory but are nevertheless described in the README file.

ADD COMMENT
0
Entering edit mode

Thank you! Finally I used some perl programming to create merged intervals from the overlapping RefSeqs.

ADD REPLY

Login before adding your answer.

Traffic: 1830 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6