Bowtie2 not aligning read...but should?
1
0
Entering edit mode
6 days ago
theclubstyle ▴ 40

Hi all,

I'm trying to map a primer pair to a reference using bowtie2. One primer maps as expected, but the other just won't, no matter what I try. A pseudo-alignment should look like:

          AGAGTTTGATCATGGCTCAG
          ||| ||||||| ||| ||||
GTAGGCGGCAAGAATTTGATCTTGGTTCAGATTGAACGCTGGCGGCGTGGATGAGGCAT

So there are 3 mismatches, a maximum run of 7, and length of 20. Setting the mismatch penalty (--mp) to 5 should give a score of -15 (I think). There are no Q scores as everything is fasta. The seed length (-L) option is set to 6 and the --score-min option to L,0,-1.2 (which should give a minimum score of -24). As far as I can tell, bowtie2 should return this alignment but it just won't. If I ramp the sensitivity right up, other, nonsensical alignments are called but this obvious one isn't.

Can anyone tell me what I'm doing wrong with this? I would, as ever, be eternally grateful!

Note: I'm aware that bowtie2 isn't necessarily the right tool for this job, but I need a highly scaleable short read mapper that outputs SAM format and allows gaps (so bowtie 1 is out).

bowtie2 • 273 views
ADD COMMENT
0
Entering edit mode

bwa mem you tried? for the same

ADD REPLY
2
Entering edit mode
6 days ago
GenoMax 147k

You could use bbmap.sh from BBMap suite with local=t matches turned on. Showing the example with fasta formatted files but fastq will work as well. Currently sending the SAM output to STDOUT but you can redirect that to a BAM file (as long as samtools or sambamba is available in $PATH).

$ more query.fa subj.fa
::::::::::::::
query.fa
::::::::::::::
>query
AGAGTTTGATCATGGCTCAG
::::::::::::::
subj.fa
::::::::::::::
>subj
GTAGGCGGCAAGAATTTGATCTTGGTTCAGATTGAACGCTGGCGGCGTGGATGAGGCAT

Do the alignment

$ bbmap.sh -Xmx6g in=query.fa out=stdout.sam ref=subj.fa local=t k=6

You should see

   ------------------   Results   ------------------   
query   0       subj    11      11      3=1X7=1X3=1X4=  *       0       0       AGAGTTTGATCATGGCTCAG    *       NM:i:3  AM:i:11

Genome:                 1
Key Length:             6
Max Indel:              16000
Minimum Score Ratio:    0.56
Mapping Mode:           normal
Reads Used:             1       (20 bases)

Mapping:                0.122 seconds.
Reads/sec:              8.20
kBases/sec:             0.16


Read 1 data:            pct reads       num reads       pct bases          num bases

mapped:                 100.0000%               1       100.0000%                 20
unambiguous:            100.0000%               1       100.0000%                 20
ambiguous:                0.0000%               0         0.0000%                  0
low-Q discards:           0.0000%               0         0.0000%                  0

perfect best site:        0.0000%               0         0.0000%                  0
semiperfect site:         0.0000%               0         0.0000%                  0

Match Rate:                   NA               NA        85.0000%                 17
Error Rate:             100.0000%               1        15.0000%                  3
Sub Rate:               100.0000%               1        15.0000%                  3
Del Rate:                 0.0000%               0         0.0000%                  0
Ins Rate:                 0.0000%               0         0.0000%                  0
N Rate:                   0.0000%               0         0.0000%                  0
ADD COMMENT

Login before adding your answer.

Traffic: 1969 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6