StringTie error: Error: no valid ID found for GFF record
0
0
Entering edit mode
2.1 years ago
wormery • 0

I have reached the StringTie section of the nextflow RNA-seq pipeline and I keep receiving this error message.

Command error:
 Running StringTie 2.2.1. Command line:
 stringtie sample1.sorted.bam --fr -G reference_genomic.gtf -o sample1.transcripts.gtf -A sample1.gene.abundance.txt -C sample1.coverage.gtf -b sample1.ballgown -p 4 -v -e
 Loading reference annotation (guides)..
Error: no valid ID found for GFF record

I am not too familiar with gff files, so any any help would be really appreciated. I have tried using AGAT (as suggested in other posts). If sorting is the issue, I cannot use gff3sort as I do not have root access.

I'm not convinced sorting is the issue anyway, as nextflow generated the gff from my gtf file - does anyone have any suggestions?

Thanks in advance. Here is a sample of my gff.

ABKE04000044.1  Genbank gene    11      11446   .       +       .       ID=nbis-gene-28068;gbkey=Gene;gene_biotype=protein_coding;gene_id=PRIPAC_91469;locus_tag=PRIPAC_91469;partial=true 

ABKE04000044.1  Genbank mRNA    11      11446   .       +       .       ID=gnl|WGS:ABKE|PRIPAC_mrna91469;Parent=nbis-gene-28068;gbkey=Gene;gene_biotype=protein_coding;gene_id=PRIPAC_91469;locus_tag=PRIPAC_91469;partial=true;transcript_id=""

ABKE04000044.1  Genbank exon    11      59      .       +       .       ID=nbis-exon-322458;Parent=gnl|WGS:ABKE|PRIPAC_mrna91469;exon_number=1;gene_id=PRIPAC_91469;locus_tag=PRIPAC_91469;orig_protein_id=gnl|WGS:ABKE|PRIPAC_91469;orig_transcript_id=gnl|WGS:ABKE|PRIPAC_mrna91469;partial=true;product=hypothetical protein;transcript_biotype=mRNA;transcript_id=gnl|WGS:ABKE|PRIPAC_mrna91469

ABKE04000044.1  Genbank exon    133     252     .       +       .       ID=nbis-exon-322459;Parent=gnl|WGS:ABKE|PRIPAC_mrna91469;exon_number=2;gene_id=PRIPAC_91469;locus_tag=PRIPAC_91469;orig_protein_id=gnl|WGS:ABKE|PRIPAC_91469;orig_transcript_id=gnl|WGS:ABKE|PRIPAC_mrna91469;partial=true;product=hypothetical protein;transcript_biotype=mRNA;transcript_id=gnl|WGS:ABKE|PRIPAC_mrna91469

UPDATE

We have found the .gff file generated from the .gtf was inaccurate. We are troubleshooting with this new information - if it works, I will post the solution.

rnaseq gff stringtie nextflow transcriptomics • 1.6k views
ADD COMMENT
0
Entering edit mode

You could try to convert the file into GTF and try again. I'm wondering if Stringtie is not confused because you use the gtf extension while using gxf (ID/Parent relationship from GFF + gene_id transcript_id relationship from GTF). In GTF the gene_id attribute should come first in the 9th column, so it might be the issue, here it is not the case.

ADD REPLY

Login before adding your answer.

Traffic: 1679 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6