Question

GFF/GTF file error / featureCounts

0

Entering edit mode

2.1 years ago

Pegasus ▴ 130

Hi all,

I am trying to generate a count.matrix for sorted bam files, using featureCounts on linux.

I have a non-modal organism (bacteria), so I generated the annotation.file using both PROKKA and RAST.

I used all the following files in featurecounts;

PROKKA.gff,

RAST.gff

RAST.gtf

gffread converted-PROKKA.gtf file

But still facing the same issue/error:

**ERROR: no features were loaded in format GTF. The annotation format can be specified by the '-F' option, and the required feature type can be specified by the '-t' option..**

I've tried using different flags like -F

I've tried replacing gene_id by ID or locus_tag or gene

I've tried replacing IDs with gene_ids using sed command

I read all posts related to this problem, but I am still facing the same error.

Here is part of prokka.gff file;

caffold_1   Aragorn:001002  tRNA    1914    1989    .   +   .   ID=KKGOPEHD_00011;inference=COORDINATES:profile:Aragorn:001002;locus_tag=KKGOPEHD_00011;product=tRNA-Pro(tgg)
Scaffold_1  Prodigal:002006 CDS 2164    2808    .   +   0   ID=KKGOPEHD_00012;inference=ab initio prediction:Prodigal:002006;locus_tag=KKGOPEHD_00012;product=hypothetical protein
Scaffold_1  Prodigal:002006 CDS 3037    3540    .   +   0   ID=KKGOPEHD_00013;eC_number=3.1.22.4;Name=ruvC;db_xref=COG:COG0817;gene=ruvC;inference=ab initio prediction:Prodigal:002006,similar to AA sequence:UniProtKB:P0A814;locus_tag=KKGOPEHD_00013;product=Crossover junction endodeoxyribonuclease RuvC
Scaffold_1  Prodigal:002006 CDS 3537    4151    .   +   0   ID=KKGOPEHD_00014;eC_number=3.6.4.12;Name=ruvA;db_xref=COG:COG0632;gene=ruvA;inference=ab initio prediction:Prodigal:002006,similar to AA sequence:UniProtKB:P0A809;locus_tag=KKGOPEHD_00014;product=Holliday junction ATP-dependent DNA helicase RuvA
Scaffold_1  Prodigal:002006 CDS 4202    5209    .   +   0   ID=KKGOPEHD_00015;eC_number=3.6.4.12;Name=ruvB_1;db_xref=COG:COG2255;gene=ruvB_1;inference=ab initio prediction:Prodigal:002006,similar to AA sequence:UniProtKB:O32055;locus_tag=KKGOPEHD_00015;product=Holliday junction ATP-dependent DNA helicase RuvB

Here is part of RAST.gff file;

Scaffold_1  FIG CDS 762 875 .   -   0   ID=fig|6666666.1005592.peg.1;Name=hypothetical protein
Scaffold_1  FIG CDS 2131    2808    .   +   1   ID=fig|6666666.1005592.peg.2;Name=BofC protein
Scaffold_1  FIG CDS 3037    3540    .   +   1   ID=fig|6666666.1005592.peg.3;Name=Crossover junction endodeoxyribonuclease RuvC (EC 3.1.22.4);Ontology_term=KEGG_ENZYME:3.1.22.4
Scaffold_1  FIG CDS 3537    4151    .   +   0   ID=fig|6666666.1005592.peg.4;Name=Holliday junction ATP-dependent DNA helicase RuvA (EC 3.6.4.12);Ontology_term=KEGG_ENZYME:3.6.4.12
Scaffold_1  FIG CDS 4202    5209    .   +   2   ID=fig|6666666.1005592.peg.5;Name=Holliday junction ATP-dependent DNA helicase RuvB (EC 3.6.4.12);Ontology_term=KEGG_ENZYME:3.6.4.12
Scaffold_1  FIG CDS 5438    7531    .   +   2   ID=fig|6666666.1005592.peg.6;Name=hypothetical protein
Scaffold_1  FIG CDS 7539    8567    .   +   0   ID=fig|6666666.1005592.peg.7;Name=S-adenosylmethionine:tRNA ribosyltransferase-isomerase (EC 2.4.99.17);Ontology_term=KEGG_ENZYME:2.4.99.17
Scaffold_1  FIG CDS 8588    9727    .   +   2   ID=fig|6666666.1005592.peg.8;Name=Queuine tRNA-ribosyltransferase (EC 2.4.2.29);Ontology_term=KEGG_ENZYME:2.4.2.29
Scaffold_1  FIG CDS 9762    10097   .   +   0   ID=fig|6666666.1005

Many thanks

RNA-SEQ • 2.0k views

ADD COMMENT • link updated 2.1 years ago by barslmn ★ 2.4k • written 2.1 years ago by Pegasus ▴ 130

1

Entering edit mode

Could you also post the command you're using? Are you specifying the feature and attribute you want with e.g. -t CDS -g ID?

ADD REPLY • link 2.1 years ago by barslmn ★ 2.4k

0

Entering edit mode

Hi barslmn,

here is the command line; should I include the gene_length in the command line?

featureCounts -p -t gene -g gene_id -a Rast.gff -o FCcount.default.txt md1.sorted.bam md2.sorted.bam md3.sorted.bam md4.sorted.bam md5.sorted.bam mc1.sorted.bam mc2.sorted.bam mc3.sorted.bam mc4.sorted.bam mc5.sorted.bam

ADD REPLY • link 2.1 years ago by Pegasus ▴ 130

0

Entering edit mode

Your gff snippets doesn't have the feature gene nor the attribute gene_id. Are these in your files?

ADD REPLY • link 2.1 years ago by barslmn ★ 2.4k

0

Entering edit mode

Unfortunately No, I have only the product and the ID numbers. I am still confused why both annotation tools did not include such data in the generated gff file.

ADD REPLY • link 2.1 years ago by Pegasus ▴ 130

0

Entering edit mode

Can you test with -t CDS and -g ID the ones you have in your annotation file?

ADD REPLY • link 2.1 years ago by barslmn ★ 2.4k