Hi guys, first time poster, please be nice :)
I am trying to separate haplotypes and get their frequenices from an PCR-amplified mixed population. My target is ~1800 bp reads and there are currently 10 known haplotypes. Those are made up from several SNPs and with a potential 6 bp deletion.
So far I work with artificially generated ONT reads but will use real data soon. My approach so far is using minimap2 and map the reads against a fasta file containing those 10 haplotypes as reference sequences. My code is:
minimap2 ${DRAFT} ${BASECALLS} \
-a -k 21 -w 11 --frag=yes -A 2 -B 8 -b 0 -r 100 -p 0.5 -N 20 \
-f 1000,5000 -n 2 -m 20 -s 40 -g 100 -2 -K 50m \
--heap-sort=yes --secondary=no \
-B 6 -O 2,8 -E 10,5 > aln.sam
I cant use the ONT mode, because the error rate would be too high, thats why im working with pseudo short reads.
Its working fine so far as it nicely maps the reads to the right references. There is one problem though: minimap2 doesnt open a gap at the deletion side (its rather to the end of the amplicon) and misaligns all reads that come after this deletion for those haplotypes that contain that deletion. I've tried working with the Gap open and extension penalties, but no avail so far.
Anyone has any ideas? Many thanks :)
See @Rob's answer for a possible explanation/solution : Alternative gap introduced when comparing insertion/deletion results between minimap2 and STAR2
Hi, thanks for the quick reply. Unfortunately, thats not exactly what my problem is. My introduced gap is not about whether its left or right aligned; its about that its not being detected at all.
As an example what I want:
What I get:
Thus leading to many positions being misaligned after the undetected deletion
In your example, are you inferring the misalignment, or is this what minimap2 is actually giving you (i.e. can you provide the CIGAR string of what is above?). In general, minimap2 is very eager to softclip reads during alignment. If it's possible to obtain a higher score by ignoring part of the read than accepting the gap open and extension penalties, then minimap2 may return such alignments; aligning the prefix of the read gapless and then softclipping the rest. You can diagnose this from the CIGAR strings. One solution may be to set a very large "end bonus" so that minimap2 tries very hard to align the read end-to-end.
The misalignmentis what I am getting with a few of my haplotypes. Here es an exemplary CIGAR strings on how it should look like on some of the haplotypes missing the deletion and thus correctly aligning:
5M2I13M1I49M2I7M1I2M2D25M1D280M1D9M3D25M1D139M1D5M1D4M1I169M1I7M1D131M1D38M1D25M1D15M1I4M2D53M2I53M1D11M1I82M1I5M1D47M5D180M1I129M1D343M
and here is one example of a haplotype with the deletion and thus with the misalignment at the end: 81M1I39M1D26M2I31M2D121M1D11M6I33M1I12M2D11M3D5M2D141M1I345M1I199M2I50M1I13M7D27M3D27M1D43M2D96M3D19M1I6M1D129M2I2M2D111M1I61M2D38M1D86M1D78M