How to convert gff3 format, PASApipeline
1
0
Entering edit mode
3.8 years ago
Ruixuan • 0

Hi all,

I'm doing UTR annotation with the use of PASA. Refer to https://github.com/PASApipeline/PASApipeline/wiki/PASA_genome_annotation.

A gff3 file is needed in this process. The sample gff3 it provides is like

gi|68711        TIGR    gene    5662    6138    .       +       .       ID=68711.t00001;Name=my protein name
gi|68711        TIGR    mRNA    5662    6138    .       +       .       ID=model.68711.m00001;Parent=68711.t00001
gi|68711        TIGR    exon    5662    6138    .       +       .       ID=68711.e00001;Parent=model.68711.m00001
gi|68711        TIGR    CDS     5662    6138    .       +       0       ID=5662_6138cds_of_68711.m00001;Parent=model.68711.m00001

But the gff3 file I downloaded from NCBI is like this

AP018495.1      DDBJ    region  1       381277  .       +       .       ID=AP018495.1:1..381277;Dbxref=taxon:2080449;gbkey=Src;isolation-source=A water/soil sample collected from the Jozankei Onsen;mol_type=genomic DNA
AP018495.1      DDBJ    CDS     261     647     .       -       0       ID=cds-BBI30141.1;Dbxref=NCBI_GP:BBI30141.1;Name=BBI30141.1;Note=ORF1;gbkey=CDS;product=hypothetical protein;protein_id=BBI30141.1
AP018495.1      DDBJ    CDS     706     1308    .       +       0       ID=cds-BBI30142.1;Dbxref=NCBI_GP:BBI30142.1;Name=BBI30142.1;Note=ORF2;gbkey=CDS;product=putative HD hydrolase;protein_id=BBI30142.1

You can see that in my file, I only have "CDS", but in its sample gff3 there are "gene, mRNA, exon, and CDS"; I was wondering how can I convert my file into the required format.

Thanks in advance

RNA-Seq Assembly • 934 views
ADD COMMENT
2
Entering edit mode
3.8 years ago
Juke34 8.9k

From AGAT

agat_convert_sp_gxf2gxf.pl --gff input.gff --ct protein_id -o standardized_file.gff

In this example, in order to collect CDS features belonging to the same mRNAm, the value of the protein_id attribute will be used. Here if a gene/locus has several isoforms, they will all have their own gene parent (Apparently there is no way in your file to see if there are isoforms). Adding --merge_loci will merge mRNA that overlap in their CDS parts under the same parent gene.

ADD COMMENT
0
Entering edit mode

Thank you so much!!!!

ADD REPLY

Login before adding your answer.

Traffic: 2594 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6