I have reached the StringTie section of the nextflow RNA-seq pipeline and I keep receiving this error message.
Command error:
Running StringTie 2.2.1. Command line:
stringtie sample1.sorted.bam --fr -G reference_genomic.gtf -o sample1.transcripts.gtf -A sample1.gene.abundance.txt -C sample1.coverage.gtf -b sample1.ballgown -p 4 -v -e
Loading reference annotation (guides)..
Error: no valid ID found for GFF record
I am not too familiar with gff files, so any any help would be really appreciated. I have tried using AGAT (as suggested in other posts). If sorting is the issue, I cannot use gff3sort as I do not have root access.
I'm not convinced sorting is the issue anyway, as nextflow generated the gff from my gtf file - does anyone have any suggestions?
Thanks in advance. Here is a sample of my gff.
ABKE04000044.1 Genbank gene 11 11446 . + . ID=nbis-gene-28068;gbkey=Gene;gene_biotype=protein_coding;gene_id=PRIPAC_91469;locus_tag=PRIPAC_91469;partial=true
ABKE04000044.1 Genbank mRNA 11 11446 . + . ID=gnl|WGS:ABKE|PRIPAC_mrna91469;Parent=nbis-gene-28068;gbkey=Gene;gene_biotype=protein_coding;gene_id=PRIPAC_91469;locus_tag=PRIPAC_91469;partial=true;transcript_id=""
ABKE04000044.1 Genbank exon 11 59 . + . ID=nbis-exon-322458;Parent=gnl|WGS:ABKE|PRIPAC_mrna91469;exon_number=1;gene_id=PRIPAC_91469;locus_tag=PRIPAC_91469;orig_protein_id=gnl|WGS:ABKE|PRIPAC_91469;orig_transcript_id=gnl|WGS:ABKE|PRIPAC_mrna91469;partial=true;product=hypothetical protein;transcript_biotype=mRNA;transcript_id=gnl|WGS:ABKE|PRIPAC_mrna91469
ABKE04000044.1 Genbank exon 133 252 . + . ID=nbis-exon-322459;Parent=gnl|WGS:ABKE|PRIPAC_mrna91469;exon_number=2;gene_id=PRIPAC_91469;locus_tag=PRIPAC_91469;orig_protein_id=gnl|WGS:ABKE|PRIPAC_91469;orig_transcript_id=gnl|WGS:ABKE|PRIPAC_mrna91469;partial=true;product=hypothetical protein;transcript_biotype=mRNA;transcript_id=gnl|WGS:ABKE|PRIPAC_mrna91469
UPDATE
We have found the .gff file generated from the .gtf was inaccurate. We are troubleshooting with this new information - if it works, I will post the solution.
You could try to convert the file into GTF and try again. I'm wondering if Stringtie is not confused because you use the
gtf
extension while usinggxf
(ID/Parent relationship from GFF + gene_id transcript_id relationship from GTF). In GTF the gene_id attribute should come first in the 9th column, so it might be the issue, here it is not the case.