I do the transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown following the step of the paper of "Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown", but at the step of mapping the reads to the reference sequences, encountered a warning "no reference transcripts were found for the genomic sequences where reads were mapped! Please make sure the -G annotation file uses the same naming convention for the genome sequences." My run code is
for sample_name in $(cat samples.list)
do
stringtie -p 8 -G sorghum/genes/ref_Sorghum_bicolor_NCBIv3_top_level.gtf -o stringtie/$sample_name.gtf -l $sample_name hist2/JGI/$sample_name.bam
done
It can get the .gtf file of each sample, the result also can do abundance estimation for Ballgown, but the abundance of all reference transcripts was zero, not zero abundance transcripts were novel transcripts, I think the result was not reliable, maybe the warning information is the problem, my reference sequences were download from NCBI(ftp://ftp.ncbi.nlm.nih.gov/genomes/Sorghum_bicolor/Assembled_chromosomes/seq/), and the annotation also download from NCBI(ftp://ftp.ncbi.nlm.nih.gov/genomes/Sorghum_bicolor/GFF/ref_Sorghum_bicolor_NCBIv3_top_level.gff3.gz), which was transformed to gtf format by the command
gffread sorghum/genes/ref_Sorghum_bicolor_NCBIv3_top_level.gff3 -T -o
sorghum/genes/ref_Sorghum_bicolor_NCBIv3_top_level.gtf
and I used the gffread to examine the gtf file,
gffread -E sorghum/genes/ref_Sorghum_bicolor_NCBIv3_top_level.gtf
there are no error, and I am sure my annotation file uses the same naming convention for the genome sequences, but why the warning was encountered, and can not get the reference transcripts abundance.
I have the same problem, did you find out how to solve the issue?
I have the same problem,and don't know how to solve it.
There might be some discrepancies between eg the sequence naming used for mapping (in the bam file) and the sequence names as they are present in the gff/gtf files or the specified gtf files does not have the necessary identifiers to use when extracting transcript info (for instance the use of gene_id, name , etc)