Dear All,
I need to align many sequences against a single reference sequence locally. And I need the ungapped alignment. I ran the bowtie2 with the following options;
bowtie2 --local --rdg 1000,1000 --rfg 1000,1000 -x $myDatabase -f -U $FastaOfSequences -S $SamOutput
I have set gap opening and extension penalties to 1000 so that the alignments will not have any gap in between.
When I look at the sam output, I see some sequences that share a stretch of > 60 nt with the target sequence, but bowtie2 has not mapped them!
When I remove gap penalty options (--rdg 1000,1000 --rfg 1000,1000
), the mentioned problem does not happen. However, some of the alignments have a gap in between.
I am surprised that many of my sequences map with target when I use the default gap penalties but not with the increased gap penalties. While, there is no gap in their alignment (with default parameters)! It is odd because I expected that by increasing the gap penalty, I would get rid of alignments with gaps in between, but it also removes some ungapped alignments.
Later, I tried lowering gap penalties (--rdg 100,100 --rfg 100,100
). It worked perfectly and output gapless alignments.
Yet, the question is why bowtie2 comes up with this problem when high gap penalties are used.
I would appreciate it if anyone could double-check this issue. I have copied one of the query sequences and my target sequence. This is my query:
TTATATTTTTTTTTGACAAGCCTTCCTATTATTCTTTTATATATAAATTGATAATAAC TGTGTTAAAAAAAAAATATATTAAATAAAATATAAAAATAATAAATAAATATAAATAA TATTAATAATATAAATTTAAATTATATAATTTAGAAATAGTTTTGTTTTTATTTTGGT TTTATAAAATGTTTTTTCTATGTCTTGTGTGCTTAAG
This is my target (that you need to index it before searching the query against it):
TTTTATATTTTTTTTTGACAAGCCTTCCTATTATTCTTTT ATATATAAATTGATAATAACTGTGTTAAACAAAACTGATT ATTTTGATAATAAAATATACTATATGATAATATATCGTAA TTATATAATGATCCGAATATATCAAAATATAATGATAAAA ATAAAAAATCTTTTAAAAAAAAACTAAAAAATGATATAAT AAATAA
Could you give a concrete example of ungapped mapping, when you remove the high gap penalties?
This is really weird behaviour. The only way I can think this happening is that after the contiguous (ungapped) match since there are still nucleotides in the Query seq, it is trying to open a gap (even to match a single nucleotide)
Case1: When the gap penalty is too high, then the total alignment score is so low of the gapped alignment that it becomes lower than the
--score-min
threshold, and somehow bowtie is giving up saying that no alignment is possible.Case 2: The gap penalties are not very high; The
--score-min
threshold is not reached while opening the gap (even of 1-nucleotide), and so it is not giving up on the alignment. Rather, it is comparing the gapped with ungapped and finding that ungapped has a comparatively higher score, it finally is reporting the ungapped alignmentThough if this is happening, it is really a bug. Could you check if you reduce the
--score-min
to very small (lower than combined gap penalties), this problem persists or not?I lowered the --score-min, but it did not work. I also played with --score-min while using default gap penalties. With default gap penalties, when I lowered --score-min, two cases happened: if I lowered it too much, it did not align at all. However, if the decrease was moderate, it mapped the query to multiple sites of the target.
Thanks for checking the
--score-min
. I see a problem in lowering this score as what is happening is that if the threshold is too low, now it is possible to "cheaply" mismatch rather than opening a gap. And since there are many different ways to permute and combine mismatches, without probably changing the overall alignment score, it is showing too many alignments.