Hi everybody! I am using RNA-Seq protocol for identifying differentially expressed gene in RNASEQ data of 6 wheat varieties. I am following this protocol https://www.nature.com/articles/nprot.2016.095.
My Reference genome source is as under. I downloaded the assembled sequences from ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/900/519/105/GCA_900519105.1_iwgsc_refseqv1.0. I Downloaded GCA_900519105.1_iwgsc_refseqv1.0_genomic.fna.gz file. This contains the FASTA formatted sequences of the chromosomes.
here as my GTF file source is ftp://ftp.ensemblgenomes.org/pub/release-42/plants/gtf/triticum_aestivu
After hisat step to get the Assembled transcripts I am taking sorted Bam files as input to stringTie. I am using following command
./stringtie G1_sorted.bam -G Triticum_aestivum.IWGSC.42.gtf -l G1-Label -o G1_ST.gtf -p 15
and the following error appear
WARNING: no reference transcripts were found for the genomic sequences where reads were mapped! Please make sure the -G annotation file uses the same naming convention for the genome sequences Can someone please suggest a solution.
It is an issue with the GTF file downloaded from NCBI. Check this post
Warning encountered while transcript abundance estimation using stringtie
Can you please suggest the suitable Reference genome for wheat and Gtf/gff3 file for wheat to avoid such error . I would be very thankful if you can help me in providing these (links) since i tried and tested many RefGenomes (even few top level too) but everytime i got the same error.
Getting the same error. Have you solved it and how?