Mutect2 read orientation bias filter effect on Somatic variant calling
0
1
Entering edit mode
23 months ago
asalimih ▴ 60

Hello,
I have a paired normal blood and tumor tissue sequenced with Illumina and want to extract somatic variants. A panel of about 600 genes are sequenced.
I'm using the GATK best practices (link). while running the workflow I encountered a strange behaviour:
If I disable run_orientation_bias_mixture_model_filter, 7 variants would get PASS label

6       407554  .       G       T       .       PASS    AS_FilterStatus=SITE;AS_SB_TABLE=63,72|4,3;DP=143;ECNT=2;GERMQ=93;MBQ=20,20;MFRL=209,141;MMQ=60,60;MPOS=48;NALOD=1.34;NLOD=6.02;POPAF=6.00;TLOD=9.20 GT:AD:AF:DP:F1R2:F2R1:FAD:SB    0/0:25,0:0.044:25:8,0:11,0:20,0:12,13,0,0       0/1:110,7:0.060:117:36,4:39,0:77,4:51,59,4,3
7       87173615        .       C       T       .       PASS    AS_FilterStatus=SITE;AS_SB_TABLE=6,13|1,1;DP=21;ECNT=2;GERMQ=20;MBQ=30,20;MFRL=298,194;MMQ=60,60;MPOS=50;NALOD=0.954;NLOD=2.41;POPAF=6.00;TLOD=6.96  GT:AD:AF:DP:F1R2:F2R1:FAD:PGT:PID:PS:SB 0|0:9,0:0.100:9:2,0:6,0:8,0:0|1:87173615_C_T:87173615:2,7,0,0   0|1:10,2:0.167:12:5,0:3,1:9,1:0|1:87173615_C_T:87173615:4,6,1,1
7       87173617        .       G       T       .       PASS    AS_FilterStatus=SITE;AS_SB_TABLE=6,13|1,1;DP=22;ECNT=2;GERMQ=20;MBQ=30,20;MFRL=298,194;MMQ=60,60;MPOS=48;NALOD=0.954;NLOD=2.41;POPAF=6.00;TLOD=6.96  GT:AD:AF:DP:F1R2:F2R1:FAD:PGT:PID:PS:SB 0|0:9,0:0.100:9:2,0:6,0:8,0:0|1:87173615_C_T:87173615:2,7,0,0   0|1:10,2:0.167:12:5,0:3,1:9,1:0|1:87173615_C_T:87173615:4,6,1,1
8       48869623        .       G       T       .       PASS    AS_FilterStatus=SITE;AS_SB_TABLE=14,1|3,3;DP=21;ECNT=1;GERMQ=4;MBQ=30,20;MFRL=338,213;MMQ=60,60;MPOS=43;NALOD=0.693;NLOD=1.16;POPAF=6.00;TLOD=8.83   GT:AD:AF:DP:F1R2:F2R1:FAD:SB    0/0:5,0:0.169:5:0,0:3,0:4,0:4,1,0,0     0/1:10,6:0.267:16:2,0:8,2:10,3:10,0,3,3
11      119142691       .       G       T       .       PASS    AS_FilterStatus=SITE;AS_SB_TABLE=3,23|0,3;DP=30;ECNT=1;GERMQ=35;MBQ=30,30;MFRL=361,293;MMQ=60,60;MPOS=18;NALOD=0.954;NLOD=2.41;POPAF=6.00;TLOD=7.03  GT:AD:AF:DP:F1R2:F2R1:FAD:SB    0/0:9,0:0.100:9:6,0:2,0:8,0:2,7,0,0     0/1:17,3:0.190:20:10,2:6,0:16,3:1,16,0,3
12      69229497        .       C       A       .       PASS    AS_FilterStatus=SITE;AS_SB_TABLE=45,37|3,2;DP=90;ECNT=1;GERMQ=93;MBQ=20,20;MFRL=228,117;MMQ=60,60;MPOS=40;NALOD=1.11;NLOD=3.61;POPAF=6.00;TLOD=7.12  GT:AD:AF:DP:F1R2:F2R1:FAD:SB    0/0:14,0:0.071:14:8,0:3,0:12,0:6,8,0,0  0/1:68,5:0.074:73:28,0:19,3:49,3:39,29,3,2
14      102550270       .       G       T       .       PASS    AS_FilterStatus=SITE;AS_SB_TABLE=61,61|4,4;DP=131;ECNT=1;GERMQ=93;MBQ=20,20;MFRL=211,153;MMQ=60,60;MPOS=42;NALOD=1.26;NLOD=5.12;POPAF=6.00;TLOD=11.45        GT:AD:AF:DP:F1R2:F2R1:FAD:SB    0/0:22,0:0.053:22:7,0:10,0:17,0:11,11,0,0       0/1:100,8:0.074:108:28,5:44,0:73,5:50,50,4,4

But enabling it filters all of them. Here is those 7 variants when run_orientation_bias_mixture_model_filter is enabled:

6       407554  .       G       T       .       contamination;orientation       AS_FilterStatus=contamination;AS_SB_TABLE=63,72|4,3;DP=143;ECNT=2;GERMQ=93;MBQ=20,20;MFRL=209,141;MMQ=60,60;MPOS=48;NALOD=1.34;NLOD=6.02;POPAF=6.00;ROQ=1;TLOD=9.20  GT:AD:AF:DP:F1R2:F2R1:FAD:SB    0/0:25,0:0.044:25:8,0:11,0:20,0:12,13,0,0       0/1:110,7:0.060:117:36,4:39,0:77,4:51,59,4,3
7       87173615        .       C       T       .       contamination;haplotype;weak_evidence   AS_FilterStatus=weak_evidence,contamination;AS_SB_TABLE=6,13|1,1;DP=21;ECNT=2;GERMQ=1;MBQ=30,20;MFRL=298,194;MMQ=60,60;MPOS=50;NALOD=0.954;NLOD=2.41;POPAF=6.00;ROQ=16;TLOD=6.96     GT:AD:AF:DP:F1R2:F2R1:FAD:PGT:PID:PS:SB 0|0:9,0:0.100:9:2,0:6,0:8,0:0|1:87173615_C_T 87173615:2,7,0,0        0|1:10,2:0.167:12:5,0:3,1:9,1:0|1:87173615_C_T:87173615:4,6,1,1
7       87173617        .       G       T       .       contamination;haplotype;weak_evidence   AS_FilterStatus=weak_evidence,contamination;AS_SB_TABLE=6,13|1,1;DP=22;ECNT=2;GERMQ=1;MBQ=30,20;MFRL=298,194;MMQ=60,60;MPOS=48;NALOD=0.954;NLOD=2.41;POPAF=6.00;ROQ=1;TLOD=6.96      GT:AD:AF:DP:F1R2:F2R1:FAD:PGT:PID:PS:SB 0|0:9,0:0.100:9:2,0:6,0:8,0:0|1:87173615_C_T 87173615:2,7,0,0        0|1:10,2:0.167:12:5,0:3,1:9,1:0|1:87173615_C_T:87173615:4,6,1,1
8       48869623        .       G       T       .       germline        AS_FilterStatus=SITE;AS_SB_TABLE=14,1|3,3;DP=21;ECNT=1;GERMQ=1;MBQ=30,20;MFRL=338,213;MMQ=60,60;MPOS=43;NALOD=0.693;NLOD=1.16;POPAF=6.00;ROQ=5;TLOD=8.83     GT:AD:AF:DP:F1R2:F2R1:FAD:SB    0/0:5,0:0.169:5:0,0:3,0:4,0:4,1,0,0     0/1:10,6:0.267:16:2,0:8,2:10,3:10,0,3,3
11      119142691       .       G       T       .       contamination;orientation;weak_evidence AS_FilterStatus=weak_evidence,contamination;AS_SB_TABLE=3,23|0,3;DP=30;ECNT=1;GERMQ=11;MBQ=30,30;MFRL=361,293;MMQ=60,60;MPOS=18;NALOD=0.954;NLOD=2.41;POPAF=6.00;ROQ=1;TLOD=7.03     GT:AD:AF:DP:F1R2:F2R1:FAD:SB    0/0:9,0:0.100:9:6,0:2,0:8,0:2,7,0,0     0/1:17,3:0.190:20:10,2:6,0:16,3:1,16,0,3
12      69229497        .       C       A       .       contamination;orientation;weak_evidence AS_FilterStatus=weak_evidence,contamination;AS_SB_TABLE=45,37|3,2;DP=90;ECNT=1;GERMQ=93;MBQ=20,20;MFRL=228,117;MMQ=60,60;MPOS=40;NALOD=1.11;NLOD=3.61;POPAF=6.00;ROQ=1;TLOD=7.12     GT:AD:AF:DP:F1R2:F2R1:FAD:SB    0/0:14,0:0.071:14:8,0:3,0:12,0:6,8,0,0  0/1:68,5:0.074:73:28,0:19,3:49,3:39,29,3,2
14      102550270       .       G       T       .       contamination;orientation       AS_FilterStatus=contamination;AS_SB_TABLE=61,61|4,4;DP=131;ECNT=1;GERMQ=93;MBQ=20,20;MFRL=211,153;MMQ=60,60;MPOS=42;NALOD=1.26;NLOD=5.12;POPAF=6.00;ROQ=1;TLOD=11.45 GT:AD:AF:DP:F1R2:F2R1:FAD:SB    0/0:22,0:0.053:22:7,0:10,0:17,0:11,11,0,0       0/1:100,8:0.074:108:28,5:44,0:73,5:50,50,4,4

The only difference between these two runs is run_orientation_bias_mixture_model_filter but as you can see other than orientation label, contamination, haplotype, germline, weak_evidence filter labels are also added to those 7 variants. I can't see how enabling it would effect contamination filters. to give more information, I'm using the gnomad database for germline variants, Exac for contamination and a panel of normal. Moreover I have enabled --genotype-germline-sites and --genotype-pon-sites. The gatk version is 4.2.6.1

somatic bias orientation mutect2 • 1.3k views
ADD COMMENT
2
Entering edit mode

If I recall correctly, the filtering approach uses several passes to determine P[error|attributes]; and a final filter is determined based on the likelihood of error category (i.e., contamination, orientation, etc). By specifying an additional category (in your case, orientation) you will alter estimated parameters and also potentially this final calculation. In other words, if you're not specifying the hard filter mechanism, adding another attribute for your filter can slightly alter the entire error model, resulting in the case above.

ADD REPLY
0
Entering edit mode

Thanks, how can I specify the hard filter mechanism?

ADD REPLY

Login before adding your answer.

Traffic: 2626 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6