hi
apologies if this is a trivial question.
I've mapped SOLiD reads using BFAST and subsequently realigned the same with SRMA. There is considerably fewer one-off mismatch errors in the SRMA modified data (in fact SRMA looks a superior tool to GATKs local realigner, which I was previously using).
At those bases where SRMA has altered the initial mapping, the base qualities all seem to be set to 1. I'm keen to work out what base quality BFAST would have given these bases, had it originally chosen the SRMA mapping instead of the alignment that it actually chose. I can't find such a tool in SRMA or BFAST, but would like to know if a tool that can do this already exists (doubtless it would be painfully slow if coded by my own hand)
For example, A read mapped in RB1 prior to realignment gives:
6625791961 0 chr13 49055589 255 50M * 0 0
TTTTAGGAAAATCACTTTGTCTAACTCAGACTTATTTTTAAAAAGAAATC
6OWGV
WCPNCV
K"4NFB:69'"77"%=<>D)'::F-"9ED8%
XA:i:2 MD:Z:30A19 XE:Z:---------------------0-------0---0-----2----1----- PG:Z:bfast IH:i:1 NH:i:1 HI:i:1 CM:i:5 NM:i:1 CQ:Z:%(>>A-1<>@.,;=<)1<>=%%8-0)(%+%%)%%)+(-.&,%,1%%+1*% AS:i:1500 CS:Z:T00003202000321120011203012212012003000020000120032
The same read after realignment gives:
6625791961 0 chr13 49055589 255 50M * 0 0
TTTTAGGAAAATCACTTTGTCTAACTCAGAATTATTTTTAAAAAGAAATC
6OWGV
WCPNCV
K"4NFB:69'""""%=<>D)'::F-"9ED8%
XC:i:683 XE:Z:---------------------0-------012-0-----2----1----- PG:Z:srma NM:i:0 CQ:Z:%(>>A-1<>@.,;=<)1<>=%%8-0)(%+%%)%%)+(-.&,%,1%%+1*% AS:i:-33 CS:Z:T00003202000321120011203012212012003000020000120032
The MAPQ is the same, but the baseQs around the C/A differ
All the best Russ, Liverpool
If you look at their paper here: http://genomebiology.com/2010/11/10/R99
If the read base matches the start node base, then no penalty is added to the previous re-alignment score. Otherwise, a negative score based on the original base quality of the read is added to the previous re-alignment score to return the current re-alignment score. Other alignment scoring schemes are possible, but mismatched bases are scored using base quality since it has been shown to improve alignment quality
So maybe they penalize bases that has mismatch while computing realignment score and then retain it as the base also has changed? Just speculating.