How to easily fix a file without exons?
1
0
Entering edit mode
2.5 years ago
JC • 0

Hello to everyone,

I am working in a eukaryotic organism and the official annotation looks like the following;

I       PomBase gene    1798347 1799015 .       +       .       ID=SPAC1002.01;Name=mrx11
I       PomBase mRNA    1798347 1799015 .       +       .       ID=SPAC1002.01.1;Parent=SPAC1002.01
I       PomBase CDS     1798347 1798830 .       +       0       ID=SPAC1002.01.1:exon:1;Parent=SPAC1002.01.1
I       PomBase intron  1798831 1798959 .       +       .       ID=SPAC1002.01.1:intron:1;Parent=SPAC1002.01.1
I       PomBase CDS     1798960 1799015 .       +       0       ID=SPAC1002.01.1:exon:2;Parent=SPAC1002.01.1
I       PomBase gene    1799061 1800053 .       +       .       ID=SPAC1002.02;Name=pom34
I       PomBase mRNA    1799061 1800053 .       +       .       ID=SPAC1002.02.1;Parent=SPAC1002.02
I       PomBase five_prime_UTR  1799061 1799127 .       +       .       ID=SPAC1002.02.1:five_prime_UTR:1;Parent=SPAC1002.02.1
I       PomBase CDS     1799128 1799817 .       +       0       ID=SPAC1002.02.1:exon:1;Parent=SPAC1002.02.1
I       PomBase three_prime_UTR 1799818 1800053 .       +       .       ID=SPAC1002.02.1:three_prime_UTR:1;Parent=SPAC1002.02.1
I       PomBase gene    1799915 1803141 .       -       .       ID=SPAC1002.03c;Name=gls2
I       PomBase mRNA    1799915 1803141 .       -       .       ID=SPAC1002.03c.1;Parent=SPAC1002.03c
I       PomBase five_prime_UTR  1802984 1803141 .       -       .       ID=SPAC1002.03c.1:five_prime_UTR:1;Parent=SPAC1002.03c.1
I       PomBase CDS     1800212 1802983 .       -       0       ID=SPAC1002.03c.1:exon:1;Parent=SPAC1002.03c.1
I       PomBase three_prime_UTR 1799915 1800211 .       -       .       ID=SPAC1002.03c.1:three_prime_UTR:1;Parent=SPAC1002.03c.1
I       PomBase gene    1803624 1804491 .       -       .       ID=SPAC1002.04c;Name=taf11
I       PomBase mRNA    1803624 1804491 .       -       .       ID=SPAC1002.04c.1;Parent=SPAC1002.04c
I       PomBase five_prime_UTR  1804373 1804491 .       -       .       ID=SPAC1002.04c.1:five_prime_UTR:1;Parent=SPAC1002.04c.1
I       PomBase CDS     1803773 1804372 .       -       0       ID=SPAC1002.04c.1:exon:1;Parent=SPAC1002.04c.1
I       PomBase three_prime_UTR 1803624 1803772 .       -       .       ID=SPAC1002.04c.1:three_prime_UTR:1;Parent=SPAC1002.04c.1

As you can see there are no exons annotated but I think that I can reconstruct them "manually" by doing some python script from the five_prime_UTR, three_prime_UTR and CDS... However it's easy to have a lot of exceptions with different types of transcripts. My question is, is there some tool that can easily add the lines with the feature "exon" mixing UTRs and CDS?

Thank you very much

annotation gff • 578 views
ADD COMMENT
1
Entering edit mode
2.5 years ago
Juke34 8.9k

Just run agat_convert_sp_gxf2gxf.pl from AGAT

ADD COMMENT

Login before adding your answer.

Traffic: 1727 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6