filter gff3 file for complete gene
1
0
Entering edit mode
5.6 years ago

I have a gff3 file which has complete length sequence. But, few of the complete sequences have multiple UTRs. I wish to filter them out. Is there any utility that is available ?

scaffold105size588288 transdecoder gene 130390 132407 . + .
scaffold105size588288 transdecoder mRNA 130390 132407 . + .
scaffold105size588288 transdecoder five_prime_UTR 130390 130818 . + .
scaffold105size588288 transdecoder exon 130390 132407 . + .
scaffold105size588288 transdecoder CDS 130819 131979 . + 0
scaffold105size588288 transdecoder three_prime_UTR 131980 132407 . + .

scaffold105size588288 transdecoder gene 278652 281390 . + .
scaffold105size588288 transdecoder mRNA 278652 281390 . + .
scaffold105size588288 transdecoder five_prime_UTR 278652 278776 . + .
scaffold105size588288 transdecoder exon 278652 278847 . + .
scaffold105size588288 transdecoder CDS 278777 278847 . + 0
scaffold105size588288 transdecoder exon 279283 280020 . + .
scaffold105size588288 transdecoder CDS 279283 279589 . + 1
scaffold105size588288 transdecoder exon 280311 280393 . + .
scaffold105size588288 transdecoder three_prime_UTR 280311 280393 . + .
scaffold105size588288 transdecoder three_prime_UTR 280593 280678 . + .

scaffold105size588288 transdecoder three_prime_UTR 280757 280812 . + .

In this trimmed example, I need to remove the second gene set as it has 3 3'UTRs and retain the first one, which is more a complete set.

Thanks in advance.

assembly next-gen genome • 1.2k views
ADD COMMENT
0
Entering edit mode

Select those that have column 4 == "gene". Please use google to find solution for this using awk or sed.

ADD REPLY
0
Entering edit mode
5.5 years ago
Juke34 8.9k

You can try the script agat_sp_manage_UTRs.pl from AGAT.
I guess it should do what you want.

ADD COMMENT

Login before adding your answer.

Traffic: 2313 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6