Hello all,
I'm doing the QC of bam files. I found there are 2 ways to remove the low mapping quality reads.
One is to remove reads by Q<30 like samtools view -h -b -q 30 -U below_q30.bam aligned.bam
The other is to remove by flags 512 like samtools view -F 512 -b sorted.bam > filtered_sorted.bam
. Flag 512 represents the reads not passing the quality control when doing alignment by samtools.
I'm wondering whether it's necessary to do both of these 2 steps. I mean, What is the MAPQ value of flag 512 ? If flag 512 has the same MAPQ value as 30, then there's no need to do both steps, right?
Thanks!
Best,
YJ
Hello Istvan Albert , thanks for the response! Do you know which flags really work, which are just placeholders? I found that removing unmapped reads by
F 4
works well. Whereas,F 512
andF 1024
are doing nothing. So, I should remove low-quality reads by-q 30
, right? But how shall I remove the duplicated reads? BTW, is it necessary to do as much filtering as the below picture shows? Thanks in advance.I would recommend reformulating the word "low quality" alignment into concepts that are more specific to alignments.
Low-quality is just a catch-all term that leaves you with little actionable information.
When you produce alignments you will get mapping quality, an alignment score, number of mismatches, alignment lengths, you also have average read quality, you can remove identical reads or identical alignments.
You can filter for any or all of the above if you wish and call that "low-quality".
But in the end, I would recommend some caution, it is not so clear when is an alignment "low-quality". In the vast majority of cases the reference genome is not the same as the true genome under study.
Filtering for low quality might just lead to filtering out the regions that make your genome unique and specific to the phenotype. More damage can be done by incorrect filtering data than not filtering at all.