meaning of "primary alignment" in samtools
1
2
Entering edit mode
4 months ago

Hi,

Sorry in advance for this question, but I wasn't able to find the samtools documentation explaining the meaning of "primary aligment" and "secondary alignment". Is someone could explian me what it is and why two aligments are detailed in the flagstat report.

Thanks

samtools • 1.1k views
ADD COMMENT
2
Entering edit mode

Check the SAM file format specification (search for primary/secondary) to get the info: https://samtools.github.io/hts-specs/SAMv1.pdf

Multiple mapping The correct placement of a read may be ambiguous, e.g., due to repeats. In this case,
there may be multiple read alignments for the same read. One of these alignments is considered
primary. All the other alignments have the secondary alignment flag set in the SAM records that
represent them. 
ADD REPLY
4
Entering edit mode
4 months ago

Basically the rules are that only one alignment for a given read can be marked as primary, and all other must be marked as secondary. How it is decided which alignment is primary, and which secondary is left entirely to the aligner to decide, and in many cases aligners can be configured to do it differently.

A common schema is report all alignments that reach a certain score, and mark the one with the highest score as the "primary alignment". If there is a tie for the best score, then the primary alignment is selected at random from amoungst them.

Another common schema is to only report the best scoring alignment for each read and mark it as primary. Again, if there is a tie, a read is chosen at random. The others, equally good, alignmnets may or may not also be reported (as secondary alignments) depending on configuraiton.

ADD COMMENT
0
Entering edit mode

Thank you very much for your answers but all this becomes more and more unclear (too bad, because the reading of the flagstat report was already enough unclear to understand). If you say that a read may have have primary and secondary or multiple alignment, then why the sum of the primary, secondary and supplementary reads is exactly equal to the total number of input reads?

In a more general way, how is it possible to define that a read is primary, secondary or supplementary before the mapping? And why after mapping the number of reads primary mapped is in fact different form the number of input primary reads?

I think that I mistake the terms "primary" and "uniquely".

here is my flagstat report to help you to understand:

2042633 + 0 in total (QC-passed reads + QC-failed reads) 
1237849 + 0 primary
539335 + 0 secondary
265449 + 0 supplementary
0 + 0 duplicates
0 + 0 primary duplicates
1757582 + 0 mapped (86.04% : N/A) 
952798 + 0 primary mapped (76.97% : N/A)
ADD REPLY
2
Entering edit mode

The number listed here as "total" is not the total number of reads, but the total number of alignments (the wording in the brackets is indeed confusing).

So, your SAM/BAM file has 2,042,633 lines in it. We will reffer to these as "alignments".

An alignment must be either primary, secondary or supplementary. So 539,335 of these alignments have the "not primary alignment" flag set, 265,449 have the "is supplementary alignment", and 1,237,849 have neither flag set, and so we reffer to them as "primary" alignments.

Because every read must have exactly one, no more, no less primary alignment, this implies that there were 1,237,849 input reads.

Of those 1,237,849 reads/primary "alignments" 952,798 are mapped - 952,768/1,237,849 = 76.97%.

This is probably one point of confusion - a primary "alignment" isn't neccessary mapped - its just a line in the SAM file that doesn't have the seconardy or supplementary flag set. Since unmapped reads would havn't the secondary or supplmentary flag set, they also count as primary "alignments", even though they are not actually aligned!

In total 1,757,582 out of 2,042,633 of the lines in the SAM file do not have the "unmapped" flag set. Thats 952,798 mapped primary alignments, plus 539,335 secondary and 265,449 supplementary alignments (which are mapped by definition) - 952798 + 539335 + 265449 = 1,757,582.

That leaves 2,042,633 - 1,757,582 = 285,051 lines that have the "unmapped flag set".

285,051 is also the the number of primary alignments minus the number of primary mapped alignments: 1,237,849 -952,798 = 285,051.

ADD REPLY
0
Entering edit mode

Which aligner are you using and what kind of data is this? Short or long reads?

ADD REPLY
0
Entering edit mode

minimap2 with a mean read length of 7000 bases.

ADD REPLY

Login before adding your answer.

Traffic: 1809 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6