I mapped pacbio reads to genome using bwa-mem2 and I wanted to get sequence similarity between the genes and reads which couldn't be retrieved from resulting bam file. To note, in certain cases, reads were mapped to genes in the direction as opposed to the gene's strand direction.
Then, I piped in the pacbio reads and did blat search using default setting against the genes db. I found the similarities to be quite low(based on alignment length, bitscore)
Example: bedIntersect file -file 1
<contig> <querymapped direction_Start><qstop> <ReadID> <Readmappeddirection><genestart><genestop><gene_name><genestrand>
contig_106 19256 20597 Read97 - 19401 21308 Gene_67 -
contig_106 19025 20633 Read97 + 19401 21308 Gene_67 -
Example: blat result (reads blat against genesdb) - file 2
<gene_name><ReadID><Identity><Alignmentlength><mismatches><gaps><querystart>querystop><substart><substop><evalue><bitscore>
Gene_67 Read97 88.51 705 80 1 1205 1908 3762 3058 2.4e-313 1070.0
The intersection length represented between column 3 and column 2 in file 1 is higher than alignment length from blat. Could this be due to strand bias? Provide some pointers on this or other ways to find similarity between reads and genes
Thanks alot. the bedintersect file is from reads mapped against the genome db the blat result is reads against genes db It did this find the similarity between the genes and the reads