I have some PacBio RNA-seq data that should have a jumbled gene in it (e.g. the exons are not in the canonical order but instead go something like 1, 2, 3, 5, 6, 4, 5, 7, 8 etc - scrambled exons). I thought that by mapping my FASTQ with HISAT2 followed by mapping the resulting .bam to the reference GTF that I would see this jumbling event in the resulting GTF for my BAM - but nothing - the codes are all "=" for this gene when I do a GFFCompare. If I open the BAM in IGV I see the jumbling event, but what I'm looking for is a way of find other jumbling events that I don't already know about. Any suggestions?
If you have PacBio data why are you using HISAT2? A proper long read aligner (that can not only accommodate the error profile plus the length) like
minimap2
would be a much better choice.Okay, that makes sense. But what about the downstream pipeline after minimap2? Was I right in assuming that stringtie and GFFcompare should show me jumbled/scrambled exons?
Are your individual reads long enough that they will span these shuffled exons and also give you read depth to generate confidence (number of reads aligned) in the alignments? You will have to carefully examine the alignments to see how
minimap2
aligns the reads.Just to clarify. If HISAT2 pipeline has produced results that make sense to you then great. I am just saying that it would be useful to examine what
minimap2
does in addition.