Question

Bowtie2 not aligning read...but should?

0

Entering edit mode

5 weeks ago

theclubstyle ▴ 40

Hi all,

I'm trying to map a primer pair to a reference using bowtie2. One primer maps as expected, but the other just won't, no matter what I try. A pseudo-alignment should look like:

          AGAGTTTGATCATGGCTCAG
          ||| ||||||| ||| ||||
GTAGGCGGCAAGAATTTGATCTTGGTTCAGATTGAACGCTGGCGGCGTGGATGAGGCAT

So there are 3 mismatches, a maximum run of 7, and length of 20. Setting the mismatch penalty (--mp) to 5 should give a score of -15 (I think). There are no Q scores as everything is fasta. The seed length (-L) option is set to 6 and the --score-min option to L,0,-1.2 (which should give a minimum score of -24). As far as I can tell, bowtie2 should return this alignment but it just won't. If I ramp the sensitivity right up, other, nonsensical alignments are called but this obvious one isn't.

Can anyone tell me what I'm doing wrong with this? I would, as ever, be eternally grateful!

Note: I'm aware that bowtie2 isn't necessarily the right tool for this job, but I need a highly scaleable short read mapper that outputs SAM format and allows gaps (so bowtie 1 is out).

bowtie2 • 330 views

ADD COMMENT • link updated 5 weeks ago by GenoMax 148k • written 5 weeks ago by theclubstyle ▴ 40

0

Entering edit mode

bwa mem you tried? for the same

ADD REPLY • link 5 weeks ago by 1769mkc ★ 1.2k

score 2 · Answer 1 · 2024-11-14

You could use bbmap.sh from BBMap suite with local=t matches turned on. Showing the example with fasta formatted files but fastq will work as well. Currently sending the SAM output to STDOUT but you can redirect that to a BAM file (as long as samtools or sambamba is available in $PATH).

$ more query.fa subj.fa
::::::::::::::
query.fa
::::::::::::::
>query
AGAGTTTGATCATGGCTCAG
::::::::::::::
subj.fa
::::::::::::::
>subj
GTAGGCGGCAAGAATTTGATCTTGGTTCAGATTGAACGCTGGCGGCGTGGATGAGGCAT

Do the alignment

$ bbmap.sh -Xmx6g in=query.fa out=stdout.sam ref=subj.fa local=t k=6

You should see

   ------------------   Results   ------------------   
query   0       subj    11      11      3=1X7=1X3=1X4=  *       0       0       AGAGTTTGATCATGGCTCAG    *       NM:i:3  AM:i:11

Genome:                 1
Key Length:             6
Max Indel:              16000
Minimum Score Ratio:    0.56
Mapping Mode:           normal
Reads Used:             1       (20 bases)

Mapping:                0.122 seconds.
Reads/sec:              8.20
kBases/sec:             0.16


Read 1 data:            pct reads       num reads       pct bases          num bases

mapped:                 100.0000%               1       100.0000%                 20
unambiguous:            100.0000%               1       100.0000%                 20
ambiguous:                0.0000%               0         0.0000%                  0
low-Q discards:           0.0000%               0         0.0000%                  0

perfect best site:        0.0000%               0         0.0000%                  0
semiperfect site:         0.0000%               0         0.0000%                  0

Match Rate:                   NA               NA        85.0000%                 17
Error Rate:             100.0000%               1        15.0000%                  3
Sub Rate:               100.0000%               1        15.0000%                  3
Del Rate:                 0.0000%               0         0.0000%                  0
Ins Rate:                 0.0000%               0         0.0000%                  0
N Rate:                   0.0000%               0         0.0000%                  0