Hi everyone,
I am doing genome annotation for Oryza sativa by following the MAKER annotation pipeline. In the configuration file of maker pipeline, I turn on "alternative splicing". here is my final gff3 file (combined homology-based cDNA EST evidence and Ab initio prediction: snap, Augustus):
head maker_annotation05.gff
##gff-version 3
Oj_ERR3890993_253 maker gene 39 122 . - . ID=maker-Oj_ERR3890993_253-exonerate_protein2genome-gene-0.1;Name=maker-Oj_ERR3890993_253-exonerate_protein2genome-gene-0.1
Oj_ERR3890993_253 maker mRNA 39 122 . - . ID=maker-Oj_ERR3890993_253-exonerate_protein2genome-gene-0.1-mRNA-1;Parent=maker-Oj_ERR3890993_253-exonerate_protein2genome-gene-0.1;Name=maker-Oj_ERR3890993_253-exonerate_protein2genome-gene-0.1-mRNA-1;_AED=0.28;_eAED=0.28;_QI=0|-1|0|1|-1|0|1|0|28;score=27.91839
Oj_ERR3890993_253 maker exon 39 122 . - . ID=maker-Oj_ERR3890993_253-exonerate_protein2genome-gene-0.1-mRNA-1:1;Parent=maker-Oj_ERR3890993_253-exonerate_protein2genome-gene-0.1-mRNA-1
Oj_ERR3890993_253 maker CDS 39 122 . - 0 ID=maker-Oj_ERR3890993_253-exonerate_protein2genome-gene-0.1-mRNA-1:cds;Parent=maker-Oj_ERR3890993_253-exonerate_protein2genome-gene-0.1-mRNA-1
Oj_ERR3890993_253 maker gene 316 1137 . + . ID=maker-Oj_ERR3890993_253-augustus-gene-0.1;Name=maker-Oj_ERR3890993_253-augustus-gene-0.1
Oj_ERR3890993_253 maker mRNA 316 1137 . + . ID=maker-Oj_ERR3890993_253-augustus-gene-0.1-mRNA-1;Parent=maker-Oj_ERR3890993_253-augustus-gene-0.1;Name=maker-Oj_ERR3890993_253-augustus-gene-0.1-mRNA-1;_AED=0.30;_eAED=0.30;_QI=0|0|0|0.33|0|0|3|0|182
Oj_ERR3890993_253 maker exon 316 594 . + . ID=maker-Oj_ERR3890993_253-augustus-gene-0.1-mRNA-1:1;Parent=maker-Oj_ERR3890993_253-augustus-gene-0.1-mRNA-1
Oj_ERR3890993_253 maker exon 790 970 . + . ID=maker-Oj_ERR3890993_253-augustus-gene-0.1-mRNA-1:2;Parent=maker-Oj_ERR3890993_253-augustus-gene-0.1-mRNA-1
Oj_ERR3890993_253 maker exon 1049 1137 . + . ID=maker-Oj_ERR3890993_253-augustus-gene-0.1-mRNA-1:3;Parent=maker-Oj_ERR3890993_253-augustus-gene-0.1-mRNA-1
cat ../maker_annotation05.gff | grep -v "^#" | awk '{print $3}' | sort | uniq -c | sort -nr 394975 CDS 243395 exon 83406 mRNA 48897 three_prime_UTR 41785 five_prime_UTR 36786 gene
I want to keep only the longest isoform for each gene in my gff3 file . I have tried with some script from chat_GPT, but the results are inconsistent: after filtering, the number of gene is not equal to the number of mRNA. Does any one know how to do that ? Many Thanks,
Many thanks for your help, It's work with my data