Question

gff to gtf missing gene id

0

Entering edit mode

19 months ago

plain_text • 0

Hi,

I was trying my hand at annotating a genome using prokka, and I've converted the output gff file to gtf (gffread file.gff -T -o file.gtf) and this is what my gtf file looks like:

CP001095.1  prokka  transcript  210 1712    .   +   .   transcript_id "LCLPEOGO_00001_gene"; gene_id "LCLPEOGO_00001_gene"; gene_name "dnaA"
CP001095.1  prokka  CDS 210 1712    .   +   0   transcript_id "LCLPEOGO_00001_gene"; gene_name "dnaA";
CP001095.1  prokka  transcript  2447    3571    .   +   .   transcript_id "LCLPEOGO_00002_gene"; gene_id "LCLPEOGO_00002_gene"; gene_name "dnaN_1"
CP001095.1  prokka  CDS 2447    3571    .   +   0   transcript_id "LCLPEOGO_00002_gene"; gene_name "dnaN_1";

Every second line is missing the gene id, the gtf file format descriptions online look different to mine, is there something wrong with my output? or can I continue to work with this - I would really like to make use of it in FeatureCounts (sorry in advance if this is a really noob question. Any help is appreciated xx ).

transcriptomics annotation • 1.7k views

ADD COMMENT • link updated 19 months ago by Ram 44k • written 19 months ago by plain_text • 0

0

Entering edit mode

It seems like it is a known thing about GFF generated using Prokka. See https://github.com/tseemann/prokka/issues/338 and https://github.com/gpertea/gffread/issues/45

One could specify -t cds in the FeatureCounts command if you want to calculate the raw counts for each CDS.

ADD REPLY • link 19 months ago by Sej Modha 5.3k

0

Entering edit mode

Thanks for the suggestion, if nothing else works I'll just specify for the cds and see what happens :)

ADD REPLY • link 19 months ago by plain_text • 0

score 3 · Accepted Answer · 2023-04-11

You can fix that using AGAT:

agat_convert_sp_gff2gtf.pl --gff test.gtf 

##gtf-version 3
CP001095.1  prokka  gene    210 1712    .   +   .   gene_id "LCLPEOGO_00001_gene"; transcript_id "LCLPEOGO_00001_gene"; ID "nbisL1-gene-1"; gene_name "dnaA";
CP001095.1  prokka  transcript  210 1712    .   +   .   gene_id "LCLPEOGO_00001_gene"; transcript_id "LCLPEOGO_00001_gene"; ID "LCLPEOGO_00001_gene"; Parent "nbisL1-gene-1"; gene_name "dnaA";
CP001095.1  prokka  exon    210 1712    .   +   .   gene_id "LCLPEOGO_00001_gene"; transcript_id "LCLPEOGO_00001_gene"; ID "nbis-exon-1"; Parent "LCLPEOGO_00001_gene"; gene_name "dnaA";
CP001095.1  prokka  CDS 210 1712    .   +   0   gene_id "LCLPEOGO_00001_gene"; transcript_id "LCLPEOGO_00001_gene"; ID "cds-1"; Parent "LCLPEOGO_00001_gene"; gene_name "dnaA";
CP001095.1  prokka  gene    2447    3571    .   +   .   gene_id "LCLPEOGO_00002_gene"; transcript_id "LCLPEOGO_00002_gene"; ID "nbisL1-gene-2"; gene_name "dnaN_1";
CP001095.1  prokka  transcript  2447    3571    .   +   .   gene_id "LCLPEOGO_00002_gene"; transcript_id "LCLPEOGO_00002_gene"; ID "LCLPEOGO_00002_gene"; Parent "nbisL1-gene-2"; gene_name "dnaN_1";
CP001095.1  prokka  exon    2447    3571    .   +   .   gene_id "LCLPEOGO_00002_gene"; transcript_id "LCLPEOGO_00002_gene"; ID "nbis-exon-2"; Parent "LCLPEOGO_00002_gene"; gene_name "dnaN_1";
CP001095.1  prokka  CDS 2447    3571    .   +   0   gene_id "LCLPEOGO_00002_gene"; transcript_id "LCLPEOGO_00002_gene"; ID "cds-2"; Parent "LCLPEOGO_00002_gene"; gene_name "dnaN_1";