genome annotation and GFF file formation
0
0
Entering edit mode
5 weeks ago

I have predicted some organism's whole genome annotation and produced gff files by using Meta Euk. but these gff files are not recognizable by feature count.

I am getting error like below

||                                                                            ||
|| Load annotation file dataset_fd73837e-381d-4b9f-bd69-503c68d90c19.dat ...  ||

ERROR: failed to find the gene identifier attribute in the 9th column of the provided GTF file.
The specified gene identifier attribute is 'gene_id ' 
An example of attributes included in your GTF annotation is 'gene_id XP_049336233.1'.

my file looks like below

CM033892.1  gene    MetaEuk 54523435    54530177    656 +   .   gene_id "XP_049336233.1"; transcript_id "XP_049336233.1"; protein_name "XP_049336233.1 uncharacterized protein C14orf93-like [Astyanax mexicanus]";

Can any one help me to figure out the issue in the gff format? (I am using usegalaxy for differential gene expression analysis )

MetaEuk GFF whole-genome-annotation • 523 views
ADD COMMENT
0
Entering edit mode

my file looks like below

that's not enough, show us the lines associated to this gene (transcript, exons, CDS)

ADD REPLY
1
Entering edit mode
CM033892.1  gene    MetaEuk 54523435    54530177    656 +   .   gene_id "XP_049336233.1"; transcript_id "XP_049336233.1"; protein_name "XP_049336233.1 uncharacterized protein C14orf93-like [Astyanax mexicanus]";
CM033892.1  mRNA    MetaEuk 54523435    54530177    656 +   .   gene_id "XP_049336233.1"; transcript_id "XP_049336233.1"; protein_name "XP_049336233.1 uncharacterized protein C14orf93-like [Astyanax mexicanus]";
CM033892.1  exon    MetaEuk 54523435    54523623    136 +   .   gene_id "XP_049336233.1"; transcript_id "XP_049336233.1"; protein_name "XP_049336233.1 uncharacterized protein C14orf93-like [Astyanax mexicanus]";
CM033892.1  CDS MetaEuk 54523435    54523623    136 +   .   gene_id "XP_049336233.1"; transcript_id "XP_049336233.1"; protein_name "XP_049336233.1 uncharacterized protein C14orf93-like [Astyanax mexicanus]";
CM033892.1  exon    MetaEuk 54525354    54525521    107 +   .   gene_id "XP_049336233.1"; transcript_id "XP_049336233.1"; protein_name "XP_049336233.1 uncharacterized protein C14orf93-like [Astyanax mexicanus]";
CM033892.1  CDS MetaEuk 54525357    54525521    107 +   .   gene_id "XP_049336233.1"; transcript_id "XP_049336233.1"; protein_name "XP_049336233.1 uncharacterized protein C14orf93-like [Astyanax mexicanus]";
CM033892.1  exon    MetaEuk 54525629    54525694    52  +   .   gene_id "XP_049336233.1"; transcript_id "XP_049336233.1"; protein_name "XP_049336233.1 uncharacterized protein C14orf93-like [Astyanax mexicanus]";
CM033892.1  CDS MetaEuk 54525635    54525694    52  +   .   gene_id "XP_049336233.1"; transcript_id "XP_049336233.1"; protein_name "XP_049336233.1 uncharacterized protein C14orf93-like [Astyanax mexicanus]";
CM033892.1  exon    MetaEuk 54526582    54526692    75  +   .   gene_id "XP_049336233.1"; transcript_id "XP_049336233.1"; protein_name "XP_049336233.1 uncharacterized protein C14orf93-like [Astyanax mexicanus]";
CM033892.1  CDS MetaEuk 54526582    54526692    75  +   .   gene_id "XP_049336233.1"; transcript_id "XP_049336233.1"; protein_name "XP_049336233.1 uncharacterized protein C14orf93-like [Astyanax mexicanus]";
CM033892.1  exon    MetaEuk 54527102    54527212    75  +   .   gene_id "XP_049336233.1"; transcript_id "XP_049336233.1"; protein_name "XP_049336233.1 uncharacterized protein C14orf93-like [Astyanax mexicanus]";
CM033892.1  CDS MetaEuk 54527117    54527212    75  +   .   gene_id "XP_049336233.1"; transcript_id "XP_049336233.1"; protein_name "XP_049336233.1 uncharacterized protein C14orf93-like [Astyanax mexicanus]";
CM033892.1  exon    MetaEuk 54529863    54530177    213 +   .   gene_id "XP_049336233.1"; transcript_id "XP_049336233.1"; protein_name "XP_049336233.1 uncharacterized protein C14orf93-like [Astyanax mexicanus]";
CM033892.1  CDS MetaEuk 54529869    54530177    213 +   .   gene_id "XP_049336233.1"; transcript_id "XP_049336233.1"; protein_name "XP_049336233.1 uncharacterized protein C14orf93-like [Astyanax mexicanus]";
CM033892.1  gene    MetaEuk 85040217    85046198    1319    +   .   gene_id "XP_049334361.1"; transcript_id "XP_049334361.1"; protein_name "XP_049334361.1 uncharacterized protein LOC125801630 [Astyanax mexicanus]";
ADD REPLY
0
Entering edit mode

This is not gff but gtf. One rule that this gtf is not following is the presence of transcript_id in gene feature. Removing that might help.

ADD REPLY
0
Entering edit mode

Would it be possible to post the command you used to run featureCounts (either the command-line command, or the settings in galaxy that you might be using for it - if I understood it correctly). It could be that the extra space i.e. 'gene_id ' could be causing the issue

ADD REPLY
0
Entering edit mode

enter image description here

ADD REPLY
0
Entering edit mode

enter image description here

ADD REPLY
0
Entering edit mode

Can you try excluding the GFF gene identifier field (i.e. keep it empty) and rerun it?

ADD REPLY
0
Entering edit mode

I have tried it but it is still not working

ADD REPLY

Login before adding your answer.

Traffic: 1880 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6