Hello,
I have a paired normal blood and tumor tissue sequenced with Illumina and want to extract somatic variants. A panel of about 600 genes are sequenced.
I'm using the GATK best practices (link). while running the workflow I encountered a strange behaviour:
If I disable run_orientation_bias_mixture_model_filter, 7 variants would get PASS label
6 407554 . G T . PASS AS_FilterStatus=SITE;AS_SB_TABLE=63,72|4,3;DP=143;ECNT=2;GERMQ=93;MBQ=20,20;MFRL=209,141;MMQ=60,60;MPOS=48;NALOD=1.34;NLOD=6.02;POPAF=6.00;TLOD=9.20 GT:AD:AF:DP:F1R2:F2R1:FAD:SB 0/0:25,0:0.044:25:8,0:11,0:20,0:12,13,0,0 0/1:110,7:0.060:117:36,4:39,0:77,4:51,59,4,3
7 87173615 . C T . PASS AS_FilterStatus=SITE;AS_SB_TABLE=6,13|1,1;DP=21;ECNT=2;GERMQ=20;MBQ=30,20;MFRL=298,194;MMQ=60,60;MPOS=50;NALOD=0.954;NLOD=2.41;POPAF=6.00;TLOD=6.96 GT:AD:AF:DP:F1R2:F2R1:FAD:PGT:PID:PS:SB 0|0:9,0:0.100:9:2,0:6,0:8,0:0|1:87173615_C_T:87173615:2,7,0,0 0|1:10,2:0.167:12:5,0:3,1:9,1:0|1:87173615_C_T:87173615:4,6,1,1
7 87173617 . G T . PASS AS_FilterStatus=SITE;AS_SB_TABLE=6,13|1,1;DP=22;ECNT=2;GERMQ=20;MBQ=30,20;MFRL=298,194;MMQ=60,60;MPOS=48;NALOD=0.954;NLOD=2.41;POPAF=6.00;TLOD=6.96 GT:AD:AF:DP:F1R2:F2R1:FAD:PGT:PID:PS:SB 0|0:9,0:0.100:9:2,0:6,0:8,0:0|1:87173615_C_T:87173615:2,7,0,0 0|1:10,2:0.167:12:5,0:3,1:9,1:0|1:87173615_C_T:87173615:4,6,1,1
8 48869623 . G T . PASS AS_FilterStatus=SITE;AS_SB_TABLE=14,1|3,3;DP=21;ECNT=1;GERMQ=4;MBQ=30,20;MFRL=338,213;MMQ=60,60;MPOS=43;NALOD=0.693;NLOD=1.16;POPAF=6.00;TLOD=8.83 GT:AD:AF:DP:F1R2:F2R1:FAD:SB 0/0:5,0:0.169:5:0,0:3,0:4,0:4,1,0,0 0/1:10,6:0.267:16:2,0:8,2:10,3:10,0,3,3
11 119142691 . G T . PASS AS_FilterStatus=SITE;AS_SB_TABLE=3,23|0,3;DP=30;ECNT=1;GERMQ=35;MBQ=30,30;MFRL=361,293;MMQ=60,60;MPOS=18;NALOD=0.954;NLOD=2.41;POPAF=6.00;TLOD=7.03 GT:AD:AF:DP:F1R2:F2R1:FAD:SB 0/0:9,0:0.100:9:6,0:2,0:8,0:2,7,0,0 0/1:17,3:0.190:20:10,2:6,0:16,3:1,16,0,3
12 69229497 . C A . PASS AS_FilterStatus=SITE;AS_SB_TABLE=45,37|3,2;DP=90;ECNT=1;GERMQ=93;MBQ=20,20;MFRL=228,117;MMQ=60,60;MPOS=40;NALOD=1.11;NLOD=3.61;POPAF=6.00;TLOD=7.12 GT:AD:AF:DP:F1R2:F2R1:FAD:SB 0/0:14,0:0.071:14:8,0:3,0:12,0:6,8,0,0 0/1:68,5:0.074:73:28,0:19,3:49,3:39,29,3,2
14 102550270 . G T . PASS AS_FilterStatus=SITE;AS_SB_TABLE=61,61|4,4;DP=131;ECNT=1;GERMQ=93;MBQ=20,20;MFRL=211,153;MMQ=60,60;MPOS=42;NALOD=1.26;NLOD=5.12;POPAF=6.00;TLOD=11.45 GT:AD:AF:DP:F1R2:F2R1:FAD:SB 0/0:22,0:0.053:22:7,0:10,0:17,0:11,11,0,0 0/1:100,8:0.074:108:28,5:44,0:73,5:50,50,4,4
But enabling it filters all of them. Here is those 7 variants when run_orientation_bias_mixture_model_filter is enabled:
6 407554 . G T . contamination;orientation AS_FilterStatus=contamination;AS_SB_TABLE=63,72|4,3;DP=143;ECNT=2;GERMQ=93;MBQ=20,20;MFRL=209,141;MMQ=60,60;MPOS=48;NALOD=1.34;NLOD=6.02;POPAF=6.00;ROQ=1;TLOD=9.20 GT:AD:AF:DP:F1R2:F2R1:FAD:SB 0/0:25,0:0.044:25:8,0:11,0:20,0:12,13,0,0 0/1:110,7:0.060:117:36,4:39,0:77,4:51,59,4,3
7 87173615 . C T . contamination;haplotype;weak_evidence AS_FilterStatus=weak_evidence,contamination;AS_SB_TABLE=6,13|1,1;DP=21;ECNT=2;GERMQ=1;MBQ=30,20;MFRL=298,194;MMQ=60,60;MPOS=50;NALOD=0.954;NLOD=2.41;POPAF=6.00;ROQ=16;TLOD=6.96 GT:AD:AF:DP:F1R2:F2R1:FAD:PGT:PID:PS:SB 0|0:9,0:0.100:9:2,0:6,0:8,0:0|1:87173615_C_T 87173615:2,7,0,0 0|1:10,2:0.167:12:5,0:3,1:9,1:0|1:87173615_C_T:87173615:4,6,1,1
7 87173617 . G T . contamination;haplotype;weak_evidence AS_FilterStatus=weak_evidence,contamination;AS_SB_TABLE=6,13|1,1;DP=22;ECNT=2;GERMQ=1;MBQ=30,20;MFRL=298,194;MMQ=60,60;MPOS=48;NALOD=0.954;NLOD=2.41;POPAF=6.00;ROQ=1;TLOD=6.96 GT:AD:AF:DP:F1R2:F2R1:FAD:PGT:PID:PS:SB 0|0:9,0:0.100:9:2,0:6,0:8,0:0|1:87173615_C_T 87173615:2,7,0,0 0|1:10,2:0.167:12:5,0:3,1:9,1:0|1:87173615_C_T:87173615:4,6,1,1
8 48869623 . G T . germline AS_FilterStatus=SITE;AS_SB_TABLE=14,1|3,3;DP=21;ECNT=1;GERMQ=1;MBQ=30,20;MFRL=338,213;MMQ=60,60;MPOS=43;NALOD=0.693;NLOD=1.16;POPAF=6.00;ROQ=5;TLOD=8.83 GT:AD:AF:DP:F1R2:F2R1:FAD:SB 0/0:5,0:0.169:5:0,0:3,0:4,0:4,1,0,0 0/1:10,6:0.267:16:2,0:8,2:10,3:10,0,3,3
11 119142691 . G T . contamination;orientation;weak_evidence AS_FilterStatus=weak_evidence,contamination;AS_SB_TABLE=3,23|0,3;DP=30;ECNT=1;GERMQ=11;MBQ=30,30;MFRL=361,293;MMQ=60,60;MPOS=18;NALOD=0.954;NLOD=2.41;POPAF=6.00;ROQ=1;TLOD=7.03 GT:AD:AF:DP:F1R2:F2R1:FAD:SB 0/0:9,0:0.100:9:6,0:2,0:8,0:2,7,0,0 0/1:17,3:0.190:20:10,2:6,0:16,3:1,16,0,3
12 69229497 . C A . contamination;orientation;weak_evidence AS_FilterStatus=weak_evidence,contamination;AS_SB_TABLE=45,37|3,2;DP=90;ECNT=1;GERMQ=93;MBQ=20,20;MFRL=228,117;MMQ=60,60;MPOS=40;NALOD=1.11;NLOD=3.61;POPAF=6.00;ROQ=1;TLOD=7.12 GT:AD:AF:DP:F1R2:F2R1:FAD:SB 0/0:14,0:0.071:14:8,0:3,0:12,0:6,8,0,0 0/1:68,5:0.074:73:28,0:19,3:49,3:39,29,3,2
14 102550270 . G T . contamination;orientation AS_FilterStatus=contamination;AS_SB_TABLE=61,61|4,4;DP=131;ECNT=1;GERMQ=93;MBQ=20,20;MFRL=211,153;MMQ=60,60;MPOS=42;NALOD=1.26;NLOD=5.12;POPAF=6.00;ROQ=1;TLOD=11.45 GT:AD:AF:DP:F1R2:F2R1:FAD:SB 0/0:22,0:0.053:22:7,0:10,0:17,0:11,11,0,0 0/1:100,8:0.074:108:28,5:44,0:73,5:50,50,4,4
The only difference between these two runs is run_orientation_bias_mixture_model_filter but as you can see other than orientation
label, contamination, haplotype, germline, weak_evidence
filter labels are also added to those 7 variants. I can't see how enabling it would effect contamination filters.
to give more information, I'm using the gnomad database for germline variants, Exac for contamination and a panel of normal. Moreover I have enabled --genotype-germline-sites and --genotype-pon-sites. The gatk version is 4.2.6.1
If I recall correctly, the filtering approach uses several passes to determine
P[error|attributes]
; and a final filter is determined based on the likelihood of error category (i.e., contamination, orientation, etc). By specifying an additional category (in your case,orientation
) you will alter estimated parameters and also potentially this final calculation. In other words, if you're not specifying thehard filter
mechanism, adding another attribute for your filter can slightly alter the entire error model, resulting in the case above.Thanks, how can I specify the hard filter mechanism?