Hello everyone,
Disclaimer: I'm a biologist doing a master's in bioinformatics, so I'm still getting to know the tools and analysis methods.
I'm using infernal's cmsearch to find global matches between a covariance model of U1 spliceosomal unit (a RNA sequence) and various eukaryote's genomes. The U1 unit matches with the 5' portion of the introns and some of its portions to perform the splicing reactions in eukaryotes.
It's pretty simple task: I run cmsearch with a -g parameter (to turn on glocal mode) and it returns matches with an alignment sequence for the whole U1 sequence.
My problem is: It's not working for a given species X because the genome has lots of insertion sequences that are fragmenting the matching regions.
My question is: how do I perform a pattern search if the database's matching sequences are fragmented in pieces considerably far away from each other?
I have a fasta file for the genome, a covariance model for U1 and a gff file for annotations of genes and introns positions.