Hi all,
I'm doing UTR annotation with the use of PASA. Refer to https://github.com/PASApipeline/PASApipeline/wiki/PASA_genome_annotation.
A gff3 file is needed in this process. The sample gff3 it provides is like
gi|68711 TIGR gene 5662 6138 . + . ID=68711.t00001;Name=my protein name
gi|68711 TIGR mRNA 5662 6138 . + . ID=model.68711.m00001;Parent=68711.t00001
gi|68711 TIGR exon 5662 6138 . + . ID=68711.e00001;Parent=model.68711.m00001
gi|68711 TIGR CDS 5662 6138 . + 0 ID=5662_6138cds_of_68711.m00001;Parent=model.68711.m00001
But the gff3 file I downloaded from NCBI is like this
AP018495.1 DDBJ region 1 381277 . + . ID=AP018495.1:1..381277;Dbxref=taxon:2080449;gbkey=Src;isolation-source=A water/soil sample collected from the Jozankei Onsen;mol_type=genomic DNA
AP018495.1 DDBJ CDS 261 647 . - 0 ID=cds-BBI30141.1;Dbxref=NCBI_GP:BBI30141.1;Name=BBI30141.1;Note=ORF1;gbkey=CDS;product=hypothetical protein;protein_id=BBI30141.1
AP018495.1 DDBJ CDS 706 1308 . + 0 ID=cds-BBI30142.1;Dbxref=NCBI_GP:BBI30142.1;Name=BBI30142.1;Note=ORF2;gbkey=CDS;product=putative HD hydrolase;protein_id=BBI30142.1
You can see that in my file, I only have "CDS", but in its sample gff3 there are "gene, mRNA, exon, and CDS"; I was wondering how can I convert my file into the required format.
Thanks in advance
Cross-post at reddit: https://www.reddit.com/r/bioinformatics/comments/lpjmp5/how_to_convert_gff3_format_pasapipeline/