Hi I need to find out which genes have only one skipping exons in the annotated database known as TXdb is there any way we can get more information about TXdb. Splicetrap tool comes with a database of txdb hg19 which has an evidence file known as TXdb.evi But I need to know more details about the TXdb evi file before I can interpret it correctly Here are a few lines from the Txdb.evi file
AA-AA-1-100316530-100316590.0[S] NM_000644,ENST00000370165,A3-178-3535-4041[A1][5/5][UPT],uc001dsl.1,
AA-AA-1-100605963-100606005.0[L] uc001dsv.1,ENST00000370141,uc009wea.1,CA-54482-9562-10032-10102-10258[INC][36/2],A3-54482-10102-10365[A0][25/7],NM_019083,
AA-AA-1-100605963-100606005.0[S] A3-54482-10102-10365[A1][25/7],
AA-AA-1-100606407-100606522.0[L] A3-54482-10365-10870[A0][13/6],uc001dsv.1,ENST00000370141,uc009wea.1,NM_019083,
AA-AA-1-100606407-100606522.0[S] A3-54482-10365-10870[A1][13/6],
AA-AA-1-100613449-100613648.0[L] uc001dsv.1,ENST00000370141,CA-54482-13994-17744-18177-18475[INC][4/2][DNT],NM_019083,A3-54482-13994-18177[A0][4/2],
AA-AA-1-100613449-100613648.0[S] A3-54482-13994-18177[A1][4/2],
AA-AA-1-100634150-100634190.0[L] ENST00000342895,ENST00000370136,
AA-AA-1-100634150-100634190.0[S] CA-127495-3081-7845-7885-12621[SKIP][27/4][UPT],uc001dsx.1,
AA-AA-1-100733629-100733707.0[L] A3-8634-3467-5138[A0][4/9/108],ENST00000498617,
AA-AA-1-100733629-100733707.0[S] ENST00000370128,uc001dtc.1,CA-8634-3467-4342-4381-4994[SKIP][9/109],NM_003729,A3-8634-3467-5138[A2][4/9/108],
AA-AA-1-100733692-100733707.0[L] A3-8634-3467-5138[A1][4/9/108],
AA-AA-1-101342417-101342420.0[L] ENST00000535414,NM_001439,A3-2135-20387-21069[A0][42/3],uc001dtl.1,ENST00000370113,NM_001033025,ENST00000450240,ENST00000370114,uc001dtk.1,
AA-AA-1-101342417-101342420.0[S] uc001dtm.1,A3-2135-20387-21069[A1][42/3],
AA-AA-1-101354384-101354420.0[L] NM_001439,uc001dtk.1,
AA-AA-1-101354384-101354420.0[S] uc001dtm.1,uc001dtl.1,NM_001033025,
AA-AA-1-101354384-101354420.1[L] A3-2135-3230-9110[A0][38/81][UPT],
AA-AA-1-101354384-101354420.1[S] A3-2135-3230-9110[A1][38/81][UPT],
AA-AA-1-101456184-101456187.0[L] A3-51611-36170-38736[A0][5/3][DNT],CA-51611-33697-36066-36170-38175[INC][45/4][DNT],
AA-AA-1-101456184-101456187.0[S] A3-51611-36170-38736[A1][5/3][DNT],CA-51611-33697-36066-36170-38178[INC][7/2][DNT],
AA-AA-1-101456184-101456187.2[L] uc001dtz.1,NM_001077394,uc001dty.1,uc001dtu.1,NM_015958,uc001dtt.1,uc001dtv.1,uc001dtw.1,uc001dts.1,
AA-AA-1-101456184-101456187.2[S] uc001dtr.1,NM_001077395,
AA-AA-1-101456184-101456187.3[L] CA-51611-33697-36066-36170-38175[SKIP][45/4][DNT],
AA-AA-1-101456184-101456187.3[S] CA-51611-33697-36066-36170-38178[SKIP][7/2][DNT],
AA-AA-1-101467100-101467143.0[S] uc001dtu.1,
AA-AA-1-101467100-101467143.1[L] ENST00000477293,
AA-AA-1-101467100-101467143.1[S] uc001dty.1,ENST00000498372,ENST00000464270,
AA-AA-1-101540831-101540950.0[L] uc001dua.2,ENST00000421013,
AA-AA-1-101540831-101540950.0[S] ENST00000454721,
AA-AA-1-1019391-1019466.0[S] ENST00000434641,
AA-AA-1-1025808-1025967.0[L] ENST00000473600,A3-54991-27885-29004[A0][3/88],
AA-AA-1-1025808-1026754.0[L] ENST00000477196,
AA-AA-1-1026920-1026945.0[L] uc009vju.1,ENST00000467751,ENST00000379339,uc001acu.2,ENST00000294576,uc001acr.2,uc001act.2,ENST00000427787,ENST00000379319,NM_017891,ENST00000482816,ENST00000477196,A3-54991-27366-27885[A1][3/93/2],ENST0
0000442117,uc001acm.2,ENST00000448924,uc001acs.2,uc001acp.2,ENST00000462097,ENST00000379325,ENST00000421241,ENST00000437760,ENST00000434641
,
There are lot of naming conventions and short forms that I am not aware of and I need someone's help who has already worked with TXdb before. I need for my analysis genes which only have one exon skipping event In this file exon skipping event is shown by CA and there are two annotations that accompany CA in square brackets namely INC/SKIP which might mean included or skipped there are also other annotations and fractions present in the file that I don't know how to interpret. e.g. what does uc00 etc means also what does [DNT] means and in the first column what does [L] or [S] means in square brackets.
And then there are cases which have only one exon skipping event and only have one exon e.g :
AA-AA-1-101456184-101456187.3[L] CA-51611-33697-36066-36170-38175[SKIP][45/4][DNT], AA-AA-1-101456184-101456187.3[S] CA-51611-33697-36066-36170-38178[SKIP][7/2][DNT],
I would very much appreciate if any one can explain or provide me any publication or resource where I can find further details and suggest me what is the best way to get a list of genes which show only one exon skipping event.
I am not familiar with TXdb but if you explain me how you want to parse that big textfile, I can write the script for you.
Scripting is not a problem its just about understanding the txdb format a little better. I have looked it up on pubmed but i dont seem to find information about it.There is an R package for TXDB and I would really appreciate to get help from someone who has already worked on txdb or R package for txdb