Number of exons per Gene [Hg38], to detect alternative splicing
2
0
Entering edit mode
7.5 years ago

Hello, and thanks for your attention.

I have been asked to make a relational matrix of the number of Exons per Genes, to compare with my results of differential expression, RNA-seq at exon-level. This is my output (the data I want to merge with the number of exons per gene)

 Gene.stable.ID  Exon.stable.ID     logFC    logCPM       LR       PValue
 ENSG00000108854 ENSE00002699144 -2.336265 6.1847281 36.61188 1.441502e-09
 ENSG00000266524 ENSE00002699841 -4.386720 1.5714714 16.81632 4.117774e-05
 ENSG00000141449 ENSE00002714420  2.095206 4.6771194 15.31847 9.082395e-05
 ENSG00000079134 ENSE00002726336 -3.935338 1.7760640 23.75956 1.091519e-06
 ENSG00000272975 ENSE00002783212 -4.120124 2.1119093 29.49263 5.612974e-08

What i expect to get in a merge:

  Gene.stable.ID  Exon.stable.ID   Exons/Gene     
2 ENSG00000100234 ENSE00000653439    2 
3 ENSG00000100234 ENSE00000653440    2 
4 ENSG00000101347 ENSE00000661813    3
5 ENSG00000101347 ENSE00000661813    3
6 ENSG00000101347 NA                 3

I have the TopTags of expressed exons and already made a relation with the genes. The thing is I want to know how many exon's a gene has, to know if it was definitely an alternative splicing event, and if it was, which exon is missing and which exon skipping event is taking place.

I'm kind of new to Bioinformatics, and couldn't find an appropiate Database to get this info from.

I hope my question is clear enough, I have been trying to find a solution but I don't want to waste a day surfing the web, maybe some1 can ease me that information.

Thanks again!

Cheers.

RNA-Seq alternative splicing • 2.0k views
ADD COMMENT
1
Entering edit mode
7.5 years ago

just using mysql+UCSC ?

$ mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19  -e 'select G.gene,G.transcript,K.exonCount from  ensGene as K, ensGtp as G where G.transcript=K.name'
    +-----------------+-----------------+-----------+
    | gene            | transcript      | exonCount |
    +-----------------+-----------------+-----------+
    | ENSG00000215700 | ENST00000400776 |         3 |
    | ENSG00000215699 | ENST00000374457 |         6 |
    | ENSG00000215699 | ENST00000433682 |         5 |
    | ENSG00000215699 | ENST00000374449 |         6 |
    | ENSG00000238953 | ENST00000459035 |         1 |
    | ENSG00000252392 | ENST00000516583 |         1 |
    | ENSG00000238315 | ENST00000459564 |         1 |
    | ENSG00000215642 | ENST00000545452 |        10 |
    | ENSG00000234295 | ENST00000425945 |        21 |
    | ENSG00000238602 | ENST00000458901 |         1 |
    | ENSG00000238421 | ENST00000459549 |         1 |
    | ENSG00000215689 | ENST00000400767 |         6 |
    | ENSG00000229510 | ENST00000421727 |         2 |
    | ENSG00000220823 | ENST00000402833 |         2 |
    | ENSG00000255783 | ENST00000536753 |         2 |
    | ENSG00000256974 | ENST00000539902 |         2 |
    | ENSG00000264723 | ENST00000577479 |         1 |
    | ENSG00000252127 | ENST00000516318 |         1 |
    | ENSG00000215750 | ENST00000400839 |         2 |
    | ENSG00000221485 | ENST00000408558 |         1 |
ADD COMMENT
0
Entering edit mode

Thanks! I don't have skills on mysql, but I'm going to try something different from what you said. Dont you know where can I find this information in Ensembl? Its a bit easier for me to DL, but I cannot find the exonCount.

Thanks a lot Pierre :)

ADD REPLY
0
Entering edit mode

I find this information in Ensembl?

try biomart http://www.ensembl.org/biomart/martview/ with GENE/TRANSCRIPT/EXON

and then

cut -f 1,2 input.txt | sort | uniq -c
ADD REPLY
0
Entering edit mode

Working on this! the only problem is I still can't find a relational column of exonCounts per gene, but I think I could make something about that.

Thanks Pierre.

ADD REPLY
0
Entering edit mode

I have a doubt, what if I want the hg38? :P

Im trying this mysql way, but couldnt find the hg38 database. Would you mind, please? :)

ADD REPLY
0
Entering edit mode

Just checked that UCSC goes from h16 to h19, since then it converged with Ensembl and GENCODE.

Thanks for your help pierre, but I neede hg38 since it was mapped to hg38.

ADD REPLY
0
Entering edit mode
ADD COMMENT

Login before adding your answer.

Traffic: 2601 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6