Question

Minimap2 mapping problem

1

Entering edit mode

5 months ago

Maximilian ▴ 10

Hi guys, first time poster, please be nice :)

I am trying to separate haplotypes and get their frequenices from an PCR-amplified mixed population. My target is ~1800 bp reads and there are currently 10 known haplotypes. Those are made up from several SNPs and with a potential 6 bp deletion.

So far I work with artificially generated ONT reads but will use real data soon. My approach so far is using minimap2 and map the reads against a fasta file containing those 10 haplotypes as reference sequences. My code is:

minimap2 ${DRAFT} ${BASECALLS} \
    -a -k 21 -w 11 --frag=yes -A 2 -B 8 -b 0 -r 100 -p 0.5 -N 20 \
    -f 1000,5000 -n 2 -m 20 -s 40 -g 100 -2 -K 50m \
    --heap-sort=yes --secondary=no \
    -B 6 -O 2,8 -E 10,5 > aln.sam

I cant use the ONT mode, because the error rate would be too high, thats why im working with pseudo short reads.

Its working fine so far as it nicely maps the reads to the right references. There is one problem though: minimap2 doesnt open a gap at the deletion side (its rather to the end of the amplicon) and misaligns all reads that come after this deletion for those haplotypes that contain that deletion. I've tried working with the Gap open and extension penalties, but no avail so far.

Anyone has any ideas? Many thanks :)

amplicon reference minimap2 mapping haplotype • 637 views

ADD COMMENT • link 5 months ago by Maximilian ▴ 10

0

Entering edit mode

See @Rob's answer for a possible explanation/solution : Alternative gap introduced when comparing insertion/deletion results between minimap2 and STAR2

ADD REPLY • link 5 months ago by GenoMax 148k

0

Entering edit mode

Hi, thanks for the quick reply. Unfortunately, thats not exactly what my problem is. My introduced gap is not about whether its left or right aligned; its about that its not being detected at all.

As an example what I want:

Refer : GCTACATGGCATGATGGACGTATGACGAAGTGAAAGC

Query: GCTACATGGC - - - - - -GACGTATGACGAAGTGAAAGC

What I get:

Refer : GCTACATGGCATGATGGACGTATGACGAAGTGAAAGC

Query: GCTACATGGCGACGTATGACGAAGTGAAAGC

Thus leading to many positions being misaligned after the undetected deletion

ADD REPLY • link 5 months ago by Maximilian ▴ 10

0

Entering edit mode

In your example, are you inferring the misalignment, or is this what minimap2 is actually giving you (i.e. can you provide the CIGAR string of what is above?). In general, minimap2 is very eager to softclip reads during alignment. If it's possible to obtain a higher score by ignoring part of the read than accepting the gap open and extension penalties, then minimap2 may return such alignments; aligning the prefix of the read gapless and then softclipping the rest. You can diagnose this from the CIGAR strings. One solution may be to set a very large "end bonus" so that minimap2 tries very hard to align the read end-to-end.

ADD REPLY • link 5 months ago by Rob 6.9k

0

Entering edit mode

The misalignmentis what I am getting with a few of my haplotypes. Here es an exemplary CIGAR strings on how it should look like on some of the haplotypes missing the deletion and thus correctly aligning:

5M2I13M1I49M2I7M1I2M2D25M1D280M1D9M3D25M1D139M1D5M1D4M1I169M1I7M1D131M1D38M1D25M1D15M1I4M2D53M2I53M1D11M1I82M1I5M1D47M5D180M1I129M1D343M enter image description here

and here is one example of a haplotype with the deletion and thus with the misalignment at the end: 81M1I39M1D26M2I31M2D121M1D11M6I33M1I12M2D11M3D5M2D141M1I345M1I199M2I50M1I13M7D27M3D27M1D43M2D96M3D19M1I6M1D129M2I2M2D111M1I61M2D38M1D86M1D78M enter image description here

ADD REPLY • link 5 months ago by Maximilian ▴ 10