Hello everyone!
I would really appreciate advice on a research problem I've been getting stuck on for the past few weeks. I am working with tomato SL4 genomes. I have bam files with RNA-seq data for multiple tomatoes, and fasta files for the reference genome. In one experimental group, we know there is non-canonical splicing of a particular gene that removes 18 nucleotides from the gene product. In another experimental group, we know there is canonical splicing of that same gene. We know this because a previous group used IGB to visualize and manually count them out and essentially compare non-splicing activity and splicing activity between groups. My goal is to automate this process. I know the exact genomic coordinates of the gene of interest for my version of the tomato genome, both canonical and non-canonical. I'm aware that I can use a tool like HISAT-2 to map RNA-seq reads from my bam files to the reference genome, but I'm worried that doing this blindly would not serve my goal, since the splicing is by definition non-canonical and not the same splicing as the reference genome. Does anyone have advice on how to move foward with this? Should I be modifying the reference genome to try to force hisat2 to accept my non-canonical splicing, or is there another tool out there that does what I'm trying to do? I've been looking into other tools, but most of them are extremely outdated and from 10+ years ago. Any advice would be greatly appreciated.