I'm really getting confused while reading this content from Bowtie2 manual about scoring system.
Doubt :
what does it mean by " length-2 read gap" ?
Is it like 2 bases mismatches in read ?
could you please have a look and help me in understanding this.
from bowtie2 manual:
"A mismatched base at a high-quality position in the read receives a penalty of -6 by default. "A length-2 read gap" receives a penalty of -11 by default (-5 for the gap open, -3 for the first extension, -3 for the second extension). A base that matches receives a bonus of +2 be default. Thus, in local alignment mode, if the read is 50 bp long and it matches the reference exactly except for one mismatch at a high-quality position and one length-2 read gap, then the overall score equals the total bonus, 2 * 49, minus the total penalty, 6 + 11, = 81."
Hi Kiran. AFAIK, a gap is not like 2 bases mismatches, instead, it could represent insertions or deletions.
I suspect that a length-2 read gap means that to align these reads, the algorithm skipped 3 bases (one to open the gap and the following two to extend it, based on the scores given in the manual). For example, suppose we have one ref sequence (ACTTGCA) and one query sequence (ACTA):
Ref: A C T T G C A
Seq: A C T - - - A
As you can see, there are 3 matches (ACT=ACT), a length-2 read gap (open a gap plus 2 bases extension), and a match (A=A). Bowtie2 will score this as 2 + 2 + 2 - 11 + 2 = - 3. Your sequence would have a deletion of 3 bases according to this alignment.
The scoring system helps the algorithm decide which alignment it should keep. For example, a second alignment could be:
Ref: A C T T G C A
Seq: A C - T - - A
This would score 2 + 2 - 5 + 2 - 8 + 2 = - 5. So the first alignment is better than the second one, based on this score system.
thank you so much, this is giving me good clarity to understand.