Format Refseq Output Obtained From Ucsc Table Browser ?
3
1
Entering edit mode
13.3 years ago
User 3035 ▴ 10

I want to get the list of RefSeq genes for human from the UCSC Table Browser. As you know, the RefSeq file that we get from the UCSC Table Browser contains the mRNA Refseq Accession number for every gene (for eg. NR_028269) .

Is there a way by which I can modify this output to get the 'Gene Symbols' instead of those mRNA RefSeq IDs ?

refseq identifiers gene • 7.3k views
ADD COMMENT
7
Entering edit mode
13.3 years ago
brentp 24k

As @Pierre notes, the name you want is in name2. If you want to get BED format from the SQL, you can use something like:

ORG=$1
#mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D $ORG -P 3306   -e "select chrom,txStart,txEnd,name2 as name,strand,exonStarts,exonEnds from refGene;" > $ORG.notbed
awk '
        BEGIN { OFS = "\t"; FS = "\t"} ;
        (NR != 1){
                delete astarts; delete aends;
                split($6, astarts, ",");
                split($7, aends, ",");

                starts=""; sizes=""
                exonCount=0
                for(i=1; i <= length(astarts); i++){
                    if (! astarts[i]) continue
                    sizes=sizes""(aends[i] - astarts[i])","
                    starts=starts""(astarts[i] = astarts[i] - $2)","
                    exonCount=exonCount + 1
                }
                print $1,$2,$3,$4,1,$5,$2,$3,"0",exonCount,sizes,starts
}' $ORG.notbed | sort -k1,1 -k2,2n > refGene.$ORG.bed

which you can save as refGene.sh and use as

sh refGene.sh hg19

or

sh refGene.sh mm9
ADD COMMENT
0
Entering edit mode

this is very helpful. Thanks

ADD REPLY
0
Entering edit mode

thank you so much !

ADD REPLY
5
Entering edit mode
13.3 years ago

Using the mysql server of the UCSC:

$ mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19 \
     -e 'select name2 from refGene where name="NR_028269"'
+--------------+
| name2        |
+--------------+
| LOC100288778 |
+--------------+
ADD COMMENT
4
Entering edit mode
13.3 years ago
Gjain 5.8k

Another way you can do in table browser is: there in the "output format" field

  1. choose "selected fields from primary and related tables"
  2. click "get output"
  3. if you are in HG19, you will see "Select Fields from hg19.refGene
  4. check on name, chrom, strand, txStart, txEnd and name2
  5. click on "get output"

The output should look like this:

+--------------+-------+--------+----------+----------+--------------+
|    #name     | chrom | strand | txStart  |  txEnd   |    name2     |
+--------------+-------+--------+----------+----------+--------------+
| NM_032291    | chr1  | +      | 66999824 | 67210768 | SGIP1        |
| NM_001080397 | chr1  | +      |  8384389 |  8404227 | SLC45A1      |
| NM_001145277 | chr1  | +      | 16767166 | 16786584 | NECAP2       |
| NR_028269    | chr12 | +      |    87983 |    91263 | LOC100288778 |
| NR_026823    | chr12 | -      |   147945 |   149412 | FAM138D      |
| NR_033859    | chr12 | -      |   246576 |   258332 | LOC574538    |
+--------------+-------+--------+----------+----------+--------------+

Hope this helps.

ADD COMMENT
0
Entering edit mode
+----------------+------+---+----------+----------+----------+----------+-------+
| NM_001293562.1 | chr1 | + | 33546713 | 33586132 | 33547850 | 33585783 | AZIN2 |
+----------------+------+---+----------+----------+----------+----------+-------+
| NM_052998.3    | chr1 | + | 33546713 | 33586132 | 33547850 | 33585783 | AZIN2 |
| NM_001301824.1 | chr1 | + | 33546729 | 33586132 | 33557656 | 33585783 | AZIN2 |
| NM_001301823.1 | chr1 | + | 33546729 | 33586132 | 33557656 | 33585783 | AZIN2 |
| NM_001301826.1 | chr1 | + | 33547778 | 33567493 | 33547850 | 33567493 | AZIN2 |
| NR_126031.1    | chr1 | + | 33547778 | 33567493 | 33567493 | 33567493 | AZIN2 |
| NM_001301825.1 | chr1 | + | 33547778 | 33586132 | 33547850 | 33585783 | AZIN2 |
+----------------+------+---+----------+----------+----------+----------+-------+

how can i choose a coordinates of a gene "AZIN2" from this repeats

ADD REPLY
0
Entering edit mode

It will depend on the biological question you are interested in. You can choose the longest transcript or other criteria that fits your hypothesis.

ADD REPLY

Login before adding your answer.

Traffic: 1504 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6