Extract Similar Genes From Ucsc Via Mysql Command
1
2
Entering edit mode
13.1 years ago

Hello everyone,

I have a specific gene, it have a special feature, its first intron is very small (3% of the entire gene)

I would like to extract genes from UCSC (MySQL) that have the same ration regardless of their length, the most important thing is that the first intron will still represent 3% of the gene length

Any idea how to do it via ucsc MySQL or you think we have to do it programmatically

Cheers

Rad

intron gene length • 2.8k views
ADD COMMENT
6
Entering edit mode
13.1 years ago

The following awk script extract the size of the first intron:

BEGIN   {
    FS="\t";
    }

    {
    split($9,exonStarts,",");
    split($10,exonEnds,",");
    geneSize=1.0*int($5)-int($4);
    exonCount=int($8);
    if(exonCount<2)
        {
        next;
        }
    if($3=="+")
        {
        printf("%f\t%s\n",(exonStarts[2]-exonEnds[1])/geneSize,$0);
        }
    else 
        {
        printf("%f\t%s\n",(exonStarts[exonCount]-exonEnds[exonCount-1])/geneSize,$0);
        }
    }

example:

$ mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19 -e 'select * from knownGene' -N | awk -f script.awk

0.151814    uc001aaa.3  chr1    +   11873   14409   11873   11873   3   11873,12612,13220,  12227,12721,14409,      uc001aaa.3
0.144716    uc010nxq.1  chr1    +   11873   14409   12189   13639   3   11873,12594,13402,  12227,12721,14409,  B7ZGX9  uc010nxq.1
0.164826    uc010nxr.1  chr1    +   11873   14409   11873   11873   3   11873,12645,13220,  12227,12697,14409,      uc010nxr.1
0.276321    uc009vis.2  chr1    -   14362   16765   14362   14362   4   14362,14969,15795,16606,    14829,15038,15942,16765,        uc009vis.2
0.101167    uc009vit.2  chr1    -   14362   19759   14362   14362   9   14362,14969,15795,16606,16857,17232,17914,18267,18912,  14829,15038,15947,16765,17055,17742,18061,18366,19759,      uc009vit.2
0.101167    uc001aae.3  chr1    -   14362   19759   14362   14362   10  14362,14969,15795,16606,16857,17232,17605,17914,18267,18912,    14829,15038,15947,16765,17055,17368,17742,18061,18366,19759,        uc001aae.3
0.066333    uc009viu.2  chr1    -   14362   19759   14362   14362   10  14362,14969,15795,16606,16857,17232,17914,18267,18500,18912,    14829,15038,15947,16765,17055,17742,18061,18369,18554,19759,        uc009viu.2
0.603283    uc001aab.3  chr1    -   14362   24901   14362   14362   10  14362,14969,15795,16606,16853,17232,17605,17914,18267,24737,    14829,15038,15947,16765,17055,17368,17742,18061,18379,24901,        uc001aab.3
0.295109    uc001aah.3  chr1    -   14362   29370   14362   14362   11  14362,14969,15795,16606,16857,17232,17605,17914,18267,24737,29320,  14829,15038,15947,16765,17055,17368,17742,18061,18366,24891,29370,      uc001aah.3
0.295109    uc009vir.2  chr1    -   14362   29370   14362   14362   10  14362,14969,15795,16606,16857,17232,17914,18267,24737,29320,    14829,15038,15947,16765,17055,17742,18061,18366,24891,29370,        uc009vir.2
ADD COMMENT
0
Entering edit mode

AWESOME ! Thanks Pierre, is there a way to detect gene length from the output ? I will take a look at the desctiption of the table knoznGene to understand the columns. Thanks for your answer

ADD REPLY
0
Entering edit mode

Oh yeah I see from your script ! that's fine. Thank you

ADD REPLY
0
Entering edit mode

Pierre do you mind if I share your solution at biocoders.net as a small snippet tutorial ?

ADD REPLY

Login before adding your answer.

Traffic: 1604 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6