Dear all,
I have nine RNA-Seq files that I aligned using the hisat2 aligner and this default command:
hisat2 -x grch37_snp_tran/genome_snp_tran -1 reads_R.fastq.gz -2 reads_F.fastq.gz -S sample.sam
The -x file was the one they recommened in the hisat2 website.
Files looked fine and I proceed to the bam, sorted bam and bai using samtools.
However, when I use stringtie with this command:
stringtie file.sorted.bam -G Homo.sapiens.GRCh37.75.gtf -o file.gtf
It happens that in some sample I am losing genes as important as CCND1, while I keep them in other files. It is very strange as I have some transcripts with zero coverage/FPKM/TPM, so I expected that they were in my file.gtf. Am I doing something wrong? If I use the same command for all the samples, how can it be that I lose this transcript in one but not in others? In addition, how it can be that I don't have the same number of genes in all my generated files if I am using the same Homo.sapiens.GRCh37.75.gtf?
Thanks
I answer it on the bottom
What version of stringtie are you using? This issue has been mentionned on the stringtie github issues: https://github.com/gpertea/stringtie/issues/141
One user reported that using version v1.3.3b as opposed to 1.2.3 fixed the issue (plotted), but the other users still report seeing this with newer versions. If would try using the latest version or 1.3.3b specifically and see what happens.
You could also try different aligners: https://github.com/gpertea/stringtie/issues/158 (STAR seems to do well according to the last poster's comment. STAR is actually my go-to aligner and I did use it with stringtie in the past.)
This user reports this issue as well (albeit on the older 1.2.3 with the acknowledged bug) but due to hisat2 parameters: https://github.com/gpertea/stringtie/issues/61
Related issue: https://github.com/gpertea/stringtie/issues/102
Have you looked at the Hisat bam file to see if there acutally is any reads for that gene? Can easily be done with a genome browser such as IGV.