Entering edit mode
4.1 years ago
robert.murphy
▴
90
I have a alignment of merged paired end Illumina short reads on de novo assembled Pacbio long reads generated by
bwa mem $reference $short> $outdir/$prefix.sam
I am wanting to see the quality of the alignment so i am running samtools stats
with the -F
flag but am unsure as to what this flag is actually showing me:
samtools stats $outdir/$prefix.sam
raw total sequences: 1815002
filtered sequences: 0
sequences: 1815002
samtools stats -F $outdir/$prefix.sam
raw total sequences: 1819392
filtered sequences: 911247
sequences: 908145
The sam tools documentation is quite limited on that this is doing/showing
What are the filtering parameters and when did this filtering occur? It is just the default filtering bwa mem
does or something like this?
Use these sites that explain various
samtools
flags in plain english:https://broadinstitute.github.io/picard/explain-flags.html
https://www.samformat.info/sam-format-flag-single (check the links at top of page to change format)
What do you mean by that? An aligner should not be filtering anything.
soft-clipping
of bases that don't map from a read is probably the closest thing.@genomax Thank you for your response. I can't see anythign to do with the -F option on these sites and I can't find a numerical ect representation of -F anywhere. This seems to be a samtools specific thing and I am unsure if and how it is encoded in the SAM file
I to was under the impression that aligners do no filtering but then became confuse by this samtools option. But I was meaning to try understand where this filtering is employed and what parameters it is bound by.
Have you looked at the SAM format specification document? On page 7 you will find a description of the
bitwise flags
. You will find these flags in column 2 of every SAM alignment line.With
-F
option these are the flags that you are specifying for doing certain operations usingsamtools
programs. These flags can be represented numerically as well as a hexadecimal value (first value is numeric and second is hex in example below). Following example represents a read marked as secondary alignment.Sites I linked above provide explanation of what these codes mean (If you are starting with say
0x100
) or what code to use with-F
if you want to do a certain operation. I don't know off the top of my head what the default value for-F
is. That is what is being used to filter your data in your original post.Ah okay so
-F
is filtering based on the SAM flags preset in column 2? Thank you :)