Multiple mapping with longreads - Get primary mapping
1
0
Entering edit mode
4.7 years ago

Hello,

I'm mapping scaffolded pacbio to a reference assembly with Minimap2. The scaffolds are very long, so I would expect mostly unique mappings. But that is far away from the case. I get the 4x more mappings than my original reads.

My questions are:

  • Is this normal with such long scaffolds? Why does this happen, is it the Ns?
  • Is this normal with unscaffolded pacbio assembly/ reads ? (also happens, see example in the end)
  • How do I get only best/primary mappings?

I am aware of filtering bam files through the -q and -f parameters, though I don't know what I should give to -f.

Here, for example, is the mapping of my scaffold1, with a length of 105192

samtools view contigs_against_ref.bam  | grep "scaffold1,"  | cut -f1-5

 (scaffold name)   (FLAG)     (HIT)     (HIT_POS) (MAPQ)
scaffold1,105192    2048    NC_024468.2 10880006    60    
scaffold1,105192    2048    NC_024468.2 20011587    60    
scaffold1,105192    2048    NC_024468.2 35345645    29    
scaffold1,105192    2064    NC_024468.2 38040128    24    
(...)  
scaffold1,105192    2064    NC_024468.2 48840231    60          
scaffold1,105192    2048    NC_024468.2 58654417    19  
(....)
scaffold1,105192    2064    NC_024468.2 114658611   60
scaffold1,105192    2048    NC_024468.2 115053147   60
scaffold1,105192    2048    NC_024468.2 115053624   41
(...)
scaffold1,105192    2064    NC_024468.2 129707383   60
scaffold1,105192    256     NC_024468.2 143808174   0
scaffold1,105192    2064    NC_024468.2 144478215   22
scaffold1,105192    2048    NC_024468.2 144478299   60

Same thing using my raw pacbio reads, here is read "29" mapping many times, but according to the FLAG they are all supplementary alignments?

29  0   NC_024465.2 13819334    60     
29  2064    NC_024465.2 170243992   60   
29  2048    NC_024465.2 13741283    60
29  2048    NC_024462.2 88062987    60
29  2048    NC_024468.2 83284448    60
29  2064    NC_024463.2 12104985    1
29  2064    NC_024459.2 209527685   31

Appreciate your attention,
Ricardo

mapping minimap2 long reads pacbio samtools • 1.8k views
ADD COMMENT
1
Entering edit mode
4.7 years ago

It looks like your scaffolds are not mapping to multiple locations entirely, but rather have a non-linear alignment. The scaffolds contain structural variants (or errors) and as such get split in multiple alignments to the reference. So probably each part of a scaffold aligns once, but each scaffold gives rise to multiple alignments.

ADD COMMENT
0
Entering edit mode

Thank you for clearing it up. Same when I use the raw pacbio reads? I expect Structural Variation, the reference is from a different genotype of my plant.

I want to see where my reads map in the reference, in order to clean out reads that don't belong to the target region that I am trying to assemble. So I'm not interested in the SVs just yet, though it is the long run objective to identify them!

ADD REPLY

Login before adding your answer.

Traffic: 1987 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6