I have e.g. a read that overhangs my reference.
It's CIGAR string is e.g. 10S40M.
My goal is to call a consensus that is longer than the original reference, meaning that the 10 soft-clipped bases are taken into account when building the consensus. There might be more than 1 read that overhangs the reference, so I cannot just add the soft-clipped bases to the reference manually.
My approach follows 2 ideas:
1) Modify the CIGAR string, so that 10S40M becomes 50M. I have also tried 10I40M.
2) Add x N's (or gaps "-") at the start of the reference. x is the length of the longest found soft-clip-overhang of all reads.
But using the mpileup-bcftools-call-tabix-bcftools-consensus pipeline results in a reference where I see N's (or gaps) at the beginning instead of the 10 bases that have been soft-clipped.
How do I achieve this?