Entering edit mode
7 weeks ago
manaswiniparija3
▴
50
I have predicted some organism's whole genome annotation and produced gff files by using Meta Euk. but these gff files are not recognizable by feature count.
I am getting error like below
|| ||
|| Load annotation file dataset_fd73837e-381d-4b9f-bd69-503c68d90c19.dat ... ||
ERROR: failed to find the gene identifier attribute in the 9th column of the provided GTF file.
The specified gene identifier attribute is 'gene_id '
An example of attributes included in your GTF annotation is 'gene_id XP_049336233.1'.
my file looks like below
CM033892.1 gene MetaEuk 54523435 54530177 656 + . gene_id "XP_049336233.1"; transcript_id "XP_049336233.1"; protein_name "XP_049336233.1 uncharacterized protein C14orf93-like [Astyanax mexicanus]";
Can any one help me to figure out the issue in the gff format? (I am using usegalaxy for differential gene expression analysis )
that's not enough, show us the lines associated to this gene (transcript, exons, CDS)
This is not gff but gtf. One rule that this gtf is not following is the presence of transcript_id in gene feature. Removing that might help.
Would it be possible to post the command you used to run
featureCounts
(either the command-line command, or the settings in galaxy that you might be using for it - if I understood it correctly). It could be that the extra space i.e.'gene_id '
could be causing the issueCan you try excluding the
GFF gene identifier
field (i.e. keep it empty) and rerun it?I have tried it but it is still not working