Generating Internal Exon Based On Known Mrna Isoform
2
1
Entering edit mode
12.0 years ago
Puriney ▴ 330

This is kind of a coding strategy question.

For given gene, it has two isoforms with 3 exons. Isoform_A is exon1-exon2-exo3, while IsoformB is exon1-exon3. Thus, the exon2 here is what I want to filter out, as internal exon.

Now I have downloaded all the exon data from UCSC genome browser UCSC genes track (selected from primary and related fields). And I just want to filter out all the "internal exon" in this question.

The input is somehow like:

#isoform_name    chr    strand    ex_start    ex_end    gene_name
isoformA    chr1    +    10,30,    15,35    geneM
isoformB    chr1    +    10,20,30,    15,25,35    geneM
isoformC    chr1    +    40,50,    45,55    geneM

Thus the exon [20-25] is called the internal exon.

The key is to deal with two string, exstart string and exend string. Can anyone provide some hint about how to cope with this issue efficiently?

p.s. I have known HEXEvent and BioMart can provide such data set. But I am just curious how to do it with local codes? Thanks a lot!

splicing • 2.6k views
ADD COMMENT
0
Entering edit mode

Please, why are there more exstart and exend values provided?

ADD REPLY
0
Entering edit mode

isoformC has a missing comma in ex_start

ADD REPLY
0
Entering edit mode

missing comma added like @JC mentioned

ADD REPLY
1
Entering edit mode
12.0 years ago

filtering out the internal exons, using awk:

 cat input.txt |\
 sed 's/,       /       /g' |\
 awk -F '  ' '{OFS="       "; Sn=split($4,S,","); En=split($5,E,","); $4=sprintf("%s,%s",S[1],S[Sn]);$5=sprintf("%s,%s",E[1],E[En]);print;}'


isoformA       chr1       +       10,30       15,35       geneM
isoformB       chr1       +       10,30       15,35       geneM
isoformC       chr1       +       40,50       45,55       geneM
ADD COMMENT
1
Entering edit mode
12.0 years ago
JC 13k

Perl option (don't forget to fix the comma in the isoformC):

 perl -plane 's/(\d+,).*?(\d+,)(\s+)(\d+,).*?(\d+\s+)/$1$2$3$4$5/' < in > out
ADD COMMENT

Login before adding your answer.

Traffic: 1664 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6