Hello,
I'm mapping scaffolded pacbio to a reference assembly with Minimap2. The scaffolds are very long, so I would expect mostly unique mappings. But that is far away from the case. I get the 4x more mappings than my original reads.
My questions are:
- Is this normal with such long scaffolds? Why does this happen, is it the Ns?
- Is this normal with unscaffolded pacbio assembly/ reads ? (also happens, see example in the end)
- How do I get only best/primary mappings?
I am aware of filtering bam files through the -q
and -f
parameters, though I don't know what I should give to -f
.
Here, for example, is the mapping of my scaffold1, with a length of 105192
samtools view contigs_against_ref.bam | grep "scaffold1," | cut -f1-5
(scaffold name) (FLAG) (HIT) (HIT_POS) (MAPQ)
scaffold1,105192 2048 NC_024468.2 10880006 60
scaffold1,105192 2048 NC_024468.2 20011587 60
scaffold1,105192 2048 NC_024468.2 35345645 29
scaffold1,105192 2064 NC_024468.2 38040128 24
(...)
scaffold1,105192 2064 NC_024468.2 48840231 60
scaffold1,105192 2048 NC_024468.2 58654417 19
(....)
scaffold1,105192 2064 NC_024468.2 114658611 60
scaffold1,105192 2048 NC_024468.2 115053147 60
scaffold1,105192 2048 NC_024468.2 115053624 41
(...)
scaffold1,105192 2064 NC_024468.2 129707383 60
scaffold1,105192 256 NC_024468.2 143808174 0
scaffold1,105192 2064 NC_024468.2 144478215 22
scaffold1,105192 2048 NC_024468.2 144478299 60
Same thing using my raw pacbio reads, here is read "29" mapping many times, but according to the FLAG they are all supplementary alignments?
29 0 NC_024465.2 13819334 60
29 2064 NC_024465.2 170243992 60
29 2048 NC_024465.2 13741283 60
29 2048 NC_024462.2 88062987 60
29 2048 NC_024468.2 83284448 60
29 2064 NC_024463.2 12104985 1
29 2064 NC_024459.2 209527685 31
Appreciate your attention,
Ricardo
Thank you for clearing it up. Same when I use the raw pacbio reads? I expect Structural Variation, the reference is from a different genotype of my plant.
I want to see where my reads map in the reference, in order to clean out reads that don't belong to the target region that I am trying to assemble. So I'm not interested in the SVs just yet, though it is the long run objective to identify them!