Wrong naming of transcript in gtf/gff3
1
0
Entering edit mode
23 months ago
Diego ▴ 110

Hi all,

Can someone tell me why naming transcript_id "jg1.t1"; gene_id "jg1"; at a gtf is not proper GFF file?

Thanks in advance

format gtf gff3 • 1.9k views
ADD COMMENT
0
Entering edit mode

who said it's wrong ?

ADD REPLY
0
Entering edit mode

Hi,

Softwares/packages such as EvidenceModeler, Augustus, and gff3tolkit always says that the ID is wrong.

CAKLNU010000942.1   other_pred1 CDS 724 1083    0.38    +   0   transcript_id "jg1.t1"; gene_id "jg1";
ADD REPLY
0
Entering edit mode

This is a CDS, is there a transcript with transcript_id "jg1.t1" and/or a gene with gene_id "jg1" in the very same file ?

ADD REPLY
1
Entering edit mode

yes. The gtf look like this

CAEGAH010020538.1      AUGUSTUS       transcript     407    7407   0.08   -      .      transcript_id "g1.t1"; gene_id "g1";
CAEGAH010020538.1      AUGUSTUS       exon   407    547    .      -      .      transcript_id "g1.t1"; gene_id "g1";
CAEGAH010020538.1      AUGUSTUS       exon   6202   7407   .      -      .      transcript_id "g1.t1"; gene_id "g1";
CAEGAH010020538.1      AUGUSTUS       CDS    407    547    .      -      0      transcript_id "g1.t1"; gene_id "g1";
CAEGAH010020538.1      AUGUSTUS       CDS    6202   7407   .      -      0      transcript_id "g1.t1"; gene_id "g1";
ADD REPLY
0
Entering edit mode

or like this one I have converted it into gff3:

CAKLNU010000942.1       gffcl   locus   724     2835    .       +       .       ID=RLOC_00000001;transcripts=jg1.t1
CAKLNU010000942.1       AUGUSTUS        transcript      724     2835    .       +       .       ID=jg1.t1;geneID=jg1;locus=RLOC_00000001
CAKLNU010000942.1       AUGUSTUS        CDS     724     1083    .       +       0       Parent=jg1.t1
CAKLNU010000942.1       AUGUSTUS        CDS     1181    1625    0.34    +       0       Parent=jg1.t1
CAKLNU010000942.1       AUGUSTUS        CDS     2270    2835    0.42    +       2       Parent=jg1.t1
ADD REPLY
2
Entering edit mode
23 months ago
Juke34 8.9k

GFF formats require ID attributes for features (gene, transcript, exon, etc.) while in GTF you must have gene_id for all features and ?transcript_id for all features excepted gene feature. For more details of differences between the two format, you can read this: https://agat.readthedocs.io/en/latest/gxf.html

ADD COMMENT
1
Entering edit mode

I think this is what I looking for: agat_convert_sp_gxf2gxf.pl

ADD REPLY
0
Entering edit mode

thanks for the very useful reply. I wonder why most tools are gff2gtf and not in the other way around. I guess these tools can do gft2gff as well. I am going to check agat

ADD REPLY
0
Entering edit mode

can agat_sp_ensembl_output_style.pl do this: -g non-ensembl.gtf [ -o ensembl_like.gtf ] ?

ADD REPLY
0
Entering edit mode

Potentially yes but you have to use AGAT >=1.0.0 and modify the AGAT config to specify that you want GTF output.

ADD REPLY

Login before adding your answer.

Traffic: 1646 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6