Hey guys, I have a question on call indel:
I have a standard sample which has a indel with a known frequency of 0.001. I used samtools to call this indel and found that samtools can find this indel, but due to some matching problems, some of the reads around the indel are treated as soft clips, which results in a low number of supported reads for the detected indel. I have no good way to calculate the exact frequency of this indel.
For example:
ref sequence: TGGGGAGCTGGCTGAAGGGGGTGCTGAGCCCAAGGATCCACCCCCTCCCGGGCCCCATTCTGAGGACCTTAAGGTGAGTG
mapping reads:
GGAGCTGGCTGAAGGGGGTGCTGAGCCCAAGCCAC
For example, there is a 3bp del GAT at position 35-37 in the ref sequence of length 80 shown here. reads of length 35bp may get a cigar information of 31M4S after mapping to this ref sequence. Such reads are difficult to be identified by the call variant software as a supported indel read.
My questions:
- Can the accuracy of this indel frequency calculation be improved by adjusting the mapping parameters of bwa to minimize the appearance of this soft clip situation? If so, how should I set it?
- Is there any other call variant software that can re-compare the sequence around the indel to improve the accuracy of the calculation? Since the frequency of this indel is relatively low, ordinary call variant software may not be able to detect this locus well.
Thank you all in advance!