Question

Similarity search and strand difference

0

Entering edit mode

20 months ago

L_bioinfo • 0

I mapped pacbio reads to genome using bwa-mem2 and I wanted to get sequence similarity between the genes and reads which couldn't be retrieved from resulting bam file. To note, in certain cases, reads were mapped to genes in the direction as opposed to the gene's strand direction.

Then, I piped in the pacbio reads and did blat search using default setting against the genes db. I found the similarities to be quite low(based on alignment length, bitscore)

Example: bedIntersect file  -file 1
<contig> <querymapped direction_Start><qstop> <ReadID> <Readmappeddirection><genestart><genestop><gene_name><genestrand>
contig_106        19256   20597   Read97     -       19401   21308   Gene_67     -       

contig_106        19025   20633   Read97     +       19401   21308   Gene_67      -

Example: blat result (reads blat against genesdb) - file 2
<gene_name><ReadID><Identity><Alignmentlength><mismatches><gaps><querystart>querystop><substart><substop><evalue><bitscore>
Gene_67       Read97     88.51   705     80      1       1205    1908    3762    3058    2.4e-313        1070.0

The intersection length represented between column 3 and column 2 in file 1 is higher than alignment length from blat. Could this be due to strand bias? Provide some pointers on this or other ways to find similarity between reads and genes

blat bwa-mem2 • 592 views

ADD COMMENT • link 20 months ago by L_bioinfo • 0

score 1 · Answer 1 · 2023-05-10

1

Entering edit mode

20 months ago

Istvan Albert 102k

the tools perform different alignments, a semi-global alignment vs local alignments

plus it looks like you are aligning against a genome vs a gene sequences.

both of these factors lead to different types of results being produced, the strand should not matter at all, as far as the aligners are concerned it is not a factor in the alignment process

ADD COMMENT • link 20 months ago by Istvan Albert 102k

0

Entering edit mode

Thanks alot. the bedintersect file is from reads mapped against the genome db the blat result is reads against genes db It did this find the similarity between the genes and the reads

ADD REPLY • link 20 months ago by L_bioinfo • 0