Hello, I have a bam file and the output of flagstat is
**2081062 + 0 in total (QC-passed reads + QC-failed reads)**
**1434239 + 0 secondary**
13883 + 0 supplementary
0 + 0 duplicates
2057380 + 0 mapped (98.86% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
stats:
**raw total sequences: 632940**
filtered sequences: 0
sequences: 632940
is sorted: 0
1st fragments: 632940
last fragments: 0
reads mapped: 609258
reads mapped and paired: 0 # paired-end technology bit set + both mates mapped
reads unmapped: 23682
reads properly paired: 0 # proper-pair bit set
reads paired: 0 # paired-end technology bit set
reads duplicated: 0 # PCR or optical duplicate bit set
reads MQ0: 246308 # mapped and MQ=0
reads QC failed: 0
**non-primary alignments: 1434239**
total length: 580845355 # ignores clipping
total first fragment length: 580845355 # ignores clipping
total last fragment length: 0 # ignores clipping
bases mapped: 569218429 # ignores clipping
bases mapped (cigar): 536472283 # more accurate
bases trimmed: 0
bases duplicated: 0
mismatches: 71417634 # from NM fields
error rate: 1.331245e-01 # mismatches / bases mapped (cigar)
average length: 917
average first fragment length: 918
average last fragment length: 0
maximum length: 18968
maximum first fragment length: 0
maximum last fragment length: 0
average quality: 16.3
insert size average: 0.0
insert size standard deviation: 0.0
inward oriented pairs: 0
outward oriented pairs: 0
pairs with other orientation: 0
pairs on different chromosomes: 0
percentage of properly paired reads (%): 0.0
Really I have 632940 reads. I don't understand the number of 2 millions in flagstat, how could it be so? I'd like to find the reads with secondary alignment. Firstly, I'd like to know how many reads I have with secondary alignment.
Thank you a lot.
P.s I've read about 2081062 - 1434239 - 13883 = 632940. But it didn't help me with understanding the number 2081062. Also idxstats sorted.bam | awk '{s+=$3+$4} END {print s}' gives me 2081062. Alignment with human transcriptome.
2081062 - 1434239 - 13883 = 632940 are these numbers for # of alignments but not reads, right? so for 632940 reads I have total 2.081.062 alignments 1.430.239 alignments secondary 13.883 supp 632.940 primary alignments right? if I'd like to have only these 632.940 primary alignments, should i do samtools view -F 260? but it gives 623.141 -F 256 646.823
how could I check how many reads from 632.940 have the unique mappings and how many multi mappings?
in description of idxstats it is written about #reads, but not alignments.
If you want the primary entries, then
-F 2304
is what you want. Remember that reads don't need to map to have a primary entry (we normally call these "alignments", regardless of whether the entry is actually aligned, which is why idxstats calls these reads).Thank you for your explanation, seems that at last I understood. I have 2 mln entries /lines in the bam file and 632 thousands are for primary alignment, 1,4 mln for secondary and 13 thousands for supplementary. -F 2304 gives me 623141. so 9799(632.940 primary alignments) entries are out of this subset. What category are they?
I'm not sure how you got that, according to both
samtools flagstat
andsamtools idxstats
you shouldn't have.samtools flagstat
is literally just going through the entries and counting things up, so it will never be wrong.