First of all, I know that I have to ask this on github but the contributor of stringtie2 not open for asking the question about the issue. Here is the problem I found. I want the gff output for investigate annotation, coverage and abundance. However, the its give me an error. Like this
WARNING: no reference transcripts were found for the genomic sequences where reads were mapped! Please make sure the -G annotation file uses the same naming convention for the genome sequences.
My input is nanopore long read prokaryote's whole transcriptome as sorted BAM file which mapping by using minimap2 with the cDNA (cds) FASTA file as reference. I have use the reference both ensembl and NCBI but they all gave me an error. I also tried with galaxy and it still give me the same result....
What step that I have missed?
Here my command use
minimap2 -a -x map-ont [myref.fa] [my.fastq]| samtools view -b - -o [my.bam]
samtools sort -T tmp -o [my.sorted.bam] [my.bam] && samtools index [my.sorted.bam]
stringtie -L [my.sortedbam] -G [ref.gtf] -o [my.gtf]
Any suggestion? maybe about the parameters or options that I have to optimize for the nanopore data. Thanks in advance
Why are you going to all this trouble when you have the simplest possible case of RNAseq?
You have a prokaryotic genome which should have no splicing. You are using long reads so even for the longest genes your reads should already cover the entire gene. Can you tell us how these libraries were made? Was there any fragmentation done after conversion of RNA to cDNA or were you sequencing RNA directly (which is possible with nanopore)? Are you seeing polycistronic reads (where reads cover multiple genes) in your data (they will show multi-mapping/seconday alignments if you aligned to just CDS fasta)?
Are you simply trying to replicate the analysis they report? I looked at the manuscript briefly and the data analysis is described in sufficient details to allow replication.