Vcf produced after ApplyVSQR step
1
0
Entering edit mode
5.3 years ago
BAGeno ▴ 190

Hi,

I am using variant recalibration for the first time. I have performed both steps of VariantRecalibrator and ApplyVSQR. I want to know whether I am performing steps correctly.

Here are my commands

gatk VariantRecalibrator -R new_hg38.fa \
                         -V allsample.vcf \
                         --resource hapmap,known=false,training=true,truth=true,prior=15.0:hapmap_3.3.hg38.vcf.gz \
                         --resource omni,known=false,training=true,truth=false,prior=12.0:omni2.5.hg38.vcf.gz \
                         --resource 1000G,known=false,training=true,truth=false,prior=10.0:thouzndG_phase1.snps.high_confidence.hg38.vcf.gz  \
                         --resource dbsnp,known=true,training=false,truth=false,prior=2.0:All_20180418.chr.hg38.vcf.gz \
                         -an QD \
                         -an MQ \
                         -an MQRankSum \
                         -an ReadPosRankSum \
                         -an FS \
                         -an SOR \
                         -mode SNP \
                         -O output.snp.recal  \
                         --tranches-file output.snp.tranches \
                         --rscript-file output.snp..plots.R

And

gatk ApplyVQSR -R new_hg38.fa \
               -V allsample.vcf \
               -O output.snps.vcf \
               --truth-sensitivity-filter-level 99.0 \
               --tranches-file output.snp.tranches  \
               --recal-file output.snp.recal \
               -mode SNP

This is my VCF.

chr1    13273   rs531730856 G   C   1023.88 VQSRTrancheSNP99.90to100.00 AC=6;AF=0.250;AN=24;AS_InbreedingCoeff=0.6320;AS_QD=12.80;BaseQRankSum=0.551;DB;DP=177;ExcessHet=0.0742;FS=12.051;InbreedingCoeff=0.6320;MLEAC=6;MLEAF=0.250;MQ=31.37;MQRankSum=0.589;NEGATIVE_TRAIN_SITE;QD=12.80;ReadPosRankSum=1.08;SOR=2.124;VQSLOD=-6.853e+00;culprit=MQRankSum    GT:AD:DP:GQ:PL  0/1:36,27:63:99:647,0,878   0/0:29,0:29:81:0,81,1215    1/1:0,4:4:12:116,12,0   ./.:0,0:0:.:0,0,0   1/1:0,4:4:12:129,12,0   0/0:8,0:8:24:0,24,246   ./.:0,0:0:.:0,0,0   0/0:19,0:19:51:0,51,765 0/1:1,8:9:0:197,0,0 0/0:10,0:10:27:0,27,405 0/0:4,0:4:12:0,12,131   0/0:5,0:5:15:0,15,204   0/0:16,0:16:39:0,39,585 ./.:0,0:0:.:0,0,0   0/0:6,0:6:15:0,15,225
chr1    13417   rs777038595 C   CGAGA   1426.83 .   AC=4;AF=0.167;AN=24;AS_InbreedingCoeff=0.2256;AS_QD=19.55;BaseQRankSum=-1.620e-01;DB;DP=154;ExcessHet=0.7463;FS=0.000;InbreedingCoeff=0.2256;MLEAC=5;MLEAF=0.208;MQ=21.09;MQRankSum=0.666;QD=19.55;ReadPosRankSum=-6.660e-01;SOR=1.179  GT:AD:DP:GQ:PL  0/1:20,22:42:99:709,0,1128  0/0:17,0:17:45:0,45,675 1/1:0,8:8:24:280,24,0   0/0:11,0:11:27:0,27,405 ./.:4,0:4:.:0,0,0   0/0:5,0:5:15:0,15,182   ./.:0,0:0:.:0,0,0   0/0:16,0:16:42:0,42,630 0/1:9,14:23:99:501,0,464    0/0:4,0:4:12:0,12,141   0/0:9,0:9:21:0,21,315   0/0:3,0:3:0:0,0,41  0/0:7,0:7:21:0,21,285   ./.:0,0:0:.:0,0,0   0/0:3,0:3:0:0,0,41
chr1    13649   rs879707275 G   C   104.2   VQSRTrancheSNP99.00to99.90  AC=2;AF=0.100;AN=20;AS_InbreedingCoeff=-0.2109;AS_QD=5.48;BaseQRankSum=0.00;DB;DP=53;ExcessHet=3.2736;FS=0.000;InbreedingCoeff=-0.2109;MLEAC=2;MLEAF=0.100;MQ=17.80;MQRankSum=0.00;QD=5.48;ReadPosRankSum=-3.350e-01;SOR=0.446;VQSLOD=-2.378e-01;culprit=MQ GT:AD:DP:GQ:PL  0/0:5,0:5:15:0,15,189   0/1:3,4:7:53:78,0,53    ./.:2,0:2:.:0,0,0   ./.:3,0:3:.:0,0,0   0/0:1,0:1:3:0,3,19  0/1:8,4:12:63:63,0,162  ./.:0,0:0:.:0,0,0   0/0:1,0:1:3:0,3,19  ./.:0,0:0:.:0,0,0   ./.:0,0:0:.:0,0,0   0/0:4,0:4:0:0,0,82  0/0:1,0:1:3:0,3,19  0/0:8,0:8:24:0,24,304   0/0:3,0:3:9:0,9,96  0/0:6,0:6:18:0,18,218
chr1    14574   rs28503599  A   G   71.21   VQSRTrancheSNP99.90to100.00 AC=5;AF=0.208;AN=24;AS_InbreedingCoeff=0.2613;AS_QD=14.24;BaseQRankSum=0.00;DB;DP=39;ExcessHet=0.0824;FS=0.000;InbreedingCoeff=0.2613;MLEAC=4;MLEAF=0.167;MQ=18.97;MQRankSum=-9.670e-01;NEGATIVE_TRAIN_SITE;QD=14.24;ReadPosRankSum=0.967;SOR=2.303;VQSLOD=-4.449e+00;culprit=MQ    GT:AD:DP:GQ:PL  0/0:13,0:13:39:0,39,447 ./.:2,0:2:.:0,0,0   ./.:0,0:0:.:0,0,0   0/0:1,0:1:3:0,3,19  0/0:1,0:1:3:0,3,19  0/1:1,2:3:21:40,0,21    1/1:0,2:2:6:54,6,0  0/0:3,0:3:9:0,9,107 0/0:1,0:1:3:0,3,19  ./.:0,0:0:.:0,0,0   0/0:4,0:4:12:0,12,140   0/0:2,0:2:6:0,6,59  0/0:3,0:3:0:0,0,22  1/1:0,1:1:3:26,3,0  0/0:3,0:3:9:0,9,100
chr1    14590   rs707679    G   A   17.26   VQSRTrancheSNP99.90to100.00 AC=2;AF=0.077;AN=26;AS_InbreedingCoeff=0.2729;AS_QD=8.63;DB;DP=47;ExcessHet=0.1047;FS=0.000;InbreedingCoeff=0.2729;MLEAC=1;MLEAF=0.038;MQ=22.51;NEGATIVE_TRAIN_SITE;QD=8.63;SOR=2.303;VQSLOD=-3.776e+00;culprit=MQ  GT:AD:DP:GQ:PL  0/0:18,0:18:51:0,51,765 0/0:3,0:3:0:0,0,41  0/0:2,0:2:6:0,6,78  0/0:1,0:1:3:0,3,19  ./.:0,0:0:.:0,0,0   0/0:3,0:3:0:0,0,4   1/1:0,2:2:6:54,6,0  0/0:5,0:5:15:0,15,137   0/0:2,0:2:6:0,6,59  0/0:1,0:1:3:0,3,37  0/0:3,0:3:9:0,9,100 0/0:2,0:2:6:0,6,59  0/0:3,0:3:9:0,9,100 ./.:0,0:0:.:0,0,0   0/0:2,0:2:6:0,6,59

Can you please tell me if that should be the output of variantreclibration step as I do not have FILTER field annotated as mentioned on gatk website. Also how should I proceed next whether should I filter these variants?

gatk ApplyVSQR • 3.2k views
ADD COMMENT
0
Entering edit mode

I've formatted your post so that your code blocks are more readable

ADD REPLY
0
Entering edit mode

Could you post your tranche plot ( VariantRecalibrator should output you a pdf containing it). Also is it WGS ? or targeted (exome, gene panel) ?

ADD REPLY
0
Entering edit mode

This is tranches file.

# Variant quality score tranches file
# Version number 5
targetTruthSensitivity,numKnown,numNovel,knownTiTv,novelTiTv,minVQSLod,filterName,model,accessibleTruthSites,callsAtTruthSites,truthSensitivity
90.00,2302437,371395,2.1178,1.4051,2.9424,VQSRTrancheSNP0.00to90.00,SNP,1291939,1162745,0.9000
99.00,2806818,409886,2.0969,1.4087,0.9599,VQSRTrancheSNP90.00to99.00,SNP,1291939,1279019,0.9900
99.90,2907437,425455,2.0915,1.4023,-2.7966,VQSRTrancheSNP99.00to99.90,SNP,1291939,1290647,0.9990
100.00,2982702,454343,2.0834,1.3928,-36326.3108,VQSRTrancheSNP99.90to100.00,SNP,1291939,1291939,1.0000
ADD REPLY
0
Entering edit mode

could you post the plot. It's easier to read ;)

ADD REPLY
0
Entering edit mode

Could you please tell me how should I plot graphs? I have R script made by VariantRecalibrator and also downloaded ggplot2 in R.

ADD REPLY
0
Entering edit mode

This is my tranches pdf file. Can you please tell me if its fine?

ADD REPLY
0
Entering edit mode

Can you clarify if this is WES / WGS / Targeted sequencing?

ADD REPLY
0
Entering edit mode

This data is of exome sequencing

ADD REPLY
1
Entering edit mode
5.3 years ago

I'd check out this post which tries to explain VQSR all in one. Particularly look at the How ApplyRecalibration works in a nutshell section:

During the first part of the recalibration process, variants in your callset were given a score called VQSLOD. At the same time, variants in your training sets were also ranked by VQSLOD. When you specify a tranche sensitivity threshold with ApplyRecalibration, expressed as a percentage (e.g. 99.9%), what happens is that the program looks at what is the VQSLOD value above which 99.9% of the variants in the training callset are included. It then takes that value of VQSLOD and uses it as a threshold to filter your variants. Variants that are above the threshold pass the filter, so the FILTER field will contain PASS. Variants that are below the threshold will be filtered out; they will be written to the output file, but in the FILTER field they will have the name of the tranche they belonged to. So VQSRTrancheSNP99.90to100.00 means that the variant was in the range of VQSLODs corresponding to the remaining 0.1% of the training set, which are basically considered false positives.

ADD COMMENT
1
Entering edit mode

So should I filter any sites which are not in my defined range of tranches?

ADD REPLY

Login before adding your answer.

Traffic: 2145 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6