Hi,
I run bwa-mem with same sample, different number of thread.
and then, output file(sam) is different.
my bwa version is 0.7.13,
I see the bwa notice (https://github.com/lh3/bwa/blob/master/NEWS.md)
Release 0.7.13 (23 Feburary 2016)
Fixed a potential bug in the multithreading mode. It may occur when mapping is much faster than file reading, which should almost never happen in practice.
Why they are still different?
thread running data(samtools flagstat)
150380122 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
145716 + 0 supplementary
0 + 0 duplicates
150143022 + 0 mapped (99.84% : N/A)
150234406 + 0 paired in sequencing
75117203 + 0 read1
75117203 + 0 read2
148053330 + 0 properly paired (98.55% : N/A)
149791510 + 0 with itself and mate mapped
205796 + 0 singletons (0.14% : N/A)
1519238 + 0 with mate mapped to a different chr
1330739 + 0 with mate mapped to a different chr (mapQ>=5)
no-thread running data(samtools flagstat)
150380124 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
145718 + 0 supplementary
0 + 0 duplicates
150143026 + 0 mapped (99.84% : N/A)
150234406 + 0 paired in sequencing
75117203 + 0 read1
75117203 + 0 read2
148053228 + 0 properly paired (98.55% : N/A)
149791516 + 0 with itself and mate mapped
205792 + 0 singletons (0.14% : N/A)
1519274 + 0 with mate mapped to a different chr
1330758 + 0 with mate mapped to a different chr (mapQ>=5)
First, thanks for your information.
I'm confused, if a different data are low quality, shouldn't it filtered on pipeline like 'base recalibrator'?
when I run Mutect2 with two different mapping data from one sample, they resulted slightly different number of 'PASS' filtered mutation from Mutect2 output.
is it no problem in practice~?
thanks again. (I am bioinfo newbie. sorry about basic question.)
Not necessarily. It depends on the variant-caller. There will always be some borderline cases. Let's say any variant with quality>20 passes filter, and variants with quality<20 fail filter. A single read mapped to a different place might change a variant from quality 20.07 to quality 19.91. So it goes from PASS to FAIL. It's still the same variant, and the evidence is basically the same.
You have to understand that you are using programs as a proxy for your intelligence. You could, of course, examine all 20 million variants individually, and if you were perfect, you'd do a better job than the variant-caller. But, in practice, you'd get bored and make mistakes (and it would take forever). So you are using variant-callers as a shortcut. As such, it's a good idea to study the variant calls, to be able to determine which ones are obviously true, and which are obviously false. With enough experience, your knowledge will be better than the hard-coded rules of current variant-callers. At that point, you can just set a somewhat lower threshold and manually examine all borderline cases to determine which are real and which are not. I guess, a variant-caller should be considered as more like triage than diagnosis.
i appreciate your help. thanks again !!