Hi,
I am using variant recalibration for the first time. I have performed both steps of VariantRecalibrator and ApplyVSQR. I want to know whether I am performing steps correctly.
Here are my commands
gatk VariantRecalibrator -R new_hg38.fa \
-V allsample.vcf \
--resource hapmap,known=false,training=true,truth=true,prior=15.0:hapmap_3.3.hg38.vcf.gz \
--resource omni,known=false,training=true,truth=false,prior=12.0:omni2.5.hg38.vcf.gz \
--resource 1000G,known=false,training=true,truth=false,prior=10.0:thouzndG_phase1.snps.high_confidence.hg38.vcf.gz \
--resource dbsnp,known=true,training=false,truth=false,prior=2.0:All_20180418.chr.hg38.vcf.gz \
-an QD \
-an MQ \
-an MQRankSum \
-an ReadPosRankSum \
-an FS \
-an SOR \
-mode SNP \
-O output.snp.recal \
--tranches-file output.snp.tranches \
--rscript-file output.snp..plots.R
And
gatk ApplyVQSR -R new_hg38.fa \
-V allsample.vcf \
-O output.snps.vcf \
--truth-sensitivity-filter-level 99.0 \
--tranches-file output.snp.tranches \
--recal-file output.snp.recal \
-mode SNP
This is my VCF.
chr1 13273 rs531730856 G C 1023.88 VQSRTrancheSNP99.90to100.00 AC=6;AF=0.250;AN=24;AS_InbreedingCoeff=0.6320;AS_QD=12.80;BaseQRankSum=0.551;DB;DP=177;ExcessHet=0.0742;FS=12.051;InbreedingCoeff=0.6320;MLEAC=6;MLEAF=0.250;MQ=31.37;MQRankSum=0.589;NEGATIVE_TRAIN_SITE;QD=12.80;ReadPosRankSum=1.08;SOR=2.124;VQSLOD=-6.853e+00;culprit=MQRankSum GT:AD:DP:GQ:PL 0/1:36,27:63:99:647,0,878 0/0:29,0:29:81:0,81,1215 1/1:0,4:4:12:116,12,0 ./.:0,0:0:.:0,0,0 1/1:0,4:4:12:129,12,0 0/0:8,0:8:24:0,24,246 ./.:0,0:0:.:0,0,0 0/0:19,0:19:51:0,51,765 0/1:1,8:9:0:197,0,0 0/0:10,0:10:27:0,27,405 0/0:4,0:4:12:0,12,131 0/0:5,0:5:15:0,15,204 0/0:16,0:16:39:0,39,585 ./.:0,0:0:.:0,0,0 0/0:6,0:6:15:0,15,225
chr1 13417 rs777038595 C CGAGA 1426.83 . AC=4;AF=0.167;AN=24;AS_InbreedingCoeff=0.2256;AS_QD=19.55;BaseQRankSum=-1.620e-01;DB;DP=154;ExcessHet=0.7463;FS=0.000;InbreedingCoeff=0.2256;MLEAC=5;MLEAF=0.208;MQ=21.09;MQRankSum=0.666;QD=19.55;ReadPosRankSum=-6.660e-01;SOR=1.179 GT:AD:DP:GQ:PL 0/1:20,22:42:99:709,0,1128 0/0:17,0:17:45:0,45,675 1/1:0,8:8:24:280,24,0 0/0:11,0:11:27:0,27,405 ./.:4,0:4:.:0,0,0 0/0:5,0:5:15:0,15,182 ./.:0,0:0:.:0,0,0 0/0:16,0:16:42:0,42,630 0/1:9,14:23:99:501,0,464 0/0:4,0:4:12:0,12,141 0/0:9,0:9:21:0,21,315 0/0:3,0:3:0:0,0,41 0/0:7,0:7:21:0,21,285 ./.:0,0:0:.:0,0,0 0/0:3,0:3:0:0,0,41
chr1 13649 rs879707275 G C 104.2 VQSRTrancheSNP99.00to99.90 AC=2;AF=0.100;AN=20;AS_InbreedingCoeff=-0.2109;AS_QD=5.48;BaseQRankSum=0.00;DB;DP=53;ExcessHet=3.2736;FS=0.000;InbreedingCoeff=-0.2109;MLEAC=2;MLEAF=0.100;MQ=17.80;MQRankSum=0.00;QD=5.48;ReadPosRankSum=-3.350e-01;SOR=0.446;VQSLOD=-2.378e-01;culprit=MQ GT:AD:DP:GQ:PL 0/0:5,0:5:15:0,15,189 0/1:3,4:7:53:78,0,53 ./.:2,0:2:.:0,0,0 ./.:3,0:3:.:0,0,0 0/0:1,0:1:3:0,3,19 0/1:8,4:12:63:63,0,162 ./.:0,0:0:.:0,0,0 0/0:1,0:1:3:0,3,19 ./.:0,0:0:.:0,0,0 ./.:0,0:0:.:0,0,0 0/0:4,0:4:0:0,0,82 0/0:1,0:1:3:0,3,19 0/0:8,0:8:24:0,24,304 0/0:3,0:3:9:0,9,96 0/0:6,0:6:18:0,18,218
chr1 14574 rs28503599 A G 71.21 VQSRTrancheSNP99.90to100.00 AC=5;AF=0.208;AN=24;AS_InbreedingCoeff=0.2613;AS_QD=14.24;BaseQRankSum=0.00;DB;DP=39;ExcessHet=0.0824;FS=0.000;InbreedingCoeff=0.2613;MLEAC=4;MLEAF=0.167;MQ=18.97;MQRankSum=-9.670e-01;NEGATIVE_TRAIN_SITE;QD=14.24;ReadPosRankSum=0.967;SOR=2.303;VQSLOD=-4.449e+00;culprit=MQ GT:AD:DP:GQ:PL 0/0:13,0:13:39:0,39,447 ./.:2,0:2:.:0,0,0 ./.:0,0:0:.:0,0,0 0/0:1,0:1:3:0,3,19 0/0:1,0:1:3:0,3,19 0/1:1,2:3:21:40,0,21 1/1:0,2:2:6:54,6,0 0/0:3,0:3:9:0,9,107 0/0:1,0:1:3:0,3,19 ./.:0,0:0:.:0,0,0 0/0:4,0:4:12:0,12,140 0/0:2,0:2:6:0,6,59 0/0:3,0:3:0:0,0,22 1/1:0,1:1:3:26,3,0 0/0:3,0:3:9:0,9,100
chr1 14590 rs707679 G A 17.26 VQSRTrancheSNP99.90to100.00 AC=2;AF=0.077;AN=26;AS_InbreedingCoeff=0.2729;AS_QD=8.63;DB;DP=47;ExcessHet=0.1047;FS=0.000;InbreedingCoeff=0.2729;MLEAC=1;MLEAF=0.038;MQ=22.51;NEGATIVE_TRAIN_SITE;QD=8.63;SOR=2.303;VQSLOD=-3.776e+00;culprit=MQ GT:AD:DP:GQ:PL 0/0:18,0:18:51:0,51,765 0/0:3,0:3:0:0,0,41 0/0:2,0:2:6:0,6,78 0/0:1,0:1:3:0,3,19 ./.:0,0:0:.:0,0,0 0/0:3,0:3:0:0,0,4 1/1:0,2:2:6:54,6,0 0/0:5,0:5:15:0,15,137 0/0:2,0:2:6:0,6,59 0/0:1,0:1:3:0,3,37 0/0:3,0:3:9:0,9,100 0/0:2,0:2:6:0,6,59 0/0:3,0:3:9:0,9,100 ./.:0,0:0:.:0,0,0 0/0:2,0:2:6:0,6,59
Can you please tell me if that should be the output of variantreclibration step as I do not have FILTER field annotated as mentioned on gatk website. Also how should I proceed next whether should I filter these variants?
I've formatted your post so that your code blocks are more readable
Could you post your tranche plot ( VariantRecalibrator should output you a pdf containing it). Also is it WGS ? or targeted (exome, gene panel) ?
This is tranches file.
could you post the plot. It's easier to read ;)
Could you please tell me how should I plot graphs? I have R script made by
VariantRecalibrator
and also downloadedggplot2
in R.This is my tranches pdf file. Can you please tell me if its fine?
Can you clarify if this is WES / WGS / Targeted sequencing?
This data is of exome sequencing