Dear all,
I would appreciate having your suggestions on the following. I am working with a VCF file that was produced by Strelka on Tumor-Normal pairs. As it is well known, Strelka2 does not provide Allele Depth (AD) or VAF (variant allele fraction) in the VCF fields. I have used Purple to add AD to a Strelka2 VCF file, and a bcftools plugin to add VAF. Subsequently, the VCF file has the following parameters :
CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NORMAL TUMOR
chr1 54720 . C CTT . PASS IC=5;IHP=6;MQ=54.6;MQ0=1;NT=ref;QSI=40;QSI_NT=40;RC=3;RU=T;SGT=ref>het; SOMATIC;SomaticEVS=17.29;TQSI=2;TQSI_NT=2
AD:BCN50:DP:DP2:DP50:FDP50:SUBDP50:TAR:TIR:TOR:VAF
34,0:0.06:29:29:36.35:2.57:0:34,40:0,0:4,0:0 [ NORMAL]
3,3:0:3:3:4.56:0:0:3,4:3,3:0,1:0.5 [TUMOR]
I would like to filter the VCF file based on AD and VAF fields in the "FORMAT" section of TUMOR and NORMAL.
In the example above, in the NORMAL sample, AD is : 34,0 : where 34 is the AD of the REF allele, and 0 is the AD of the ALT allele. In the TUMOR sample , AD is 3,3, where 3 is the AD of the REF allele, and 3 is the AD of the ALT allele.
Talking about VAF, in the NORMAL sample : VAF is 0, and in the TUMOR sample, VAF is 0.5.
Shall I need to filter this VCF file based on criteria such as :
AD of ALT allele in the TUMOR sample to be > 2, and VAF in the TUMOR sample to be > 0.3
How would you write the filtering expression. I have tried several JEXL expression in in GATK such as :
java -jar gatk-package-4.4.0.0-local.jar SelectVariants -R hg38.fa -V $vcf -select 'vc.getGenotype("TUMOR").getVAF() > 0.3 && vc.getGenotype("TUMOR").getAD().1 > 2 ' -output o.vcf
but no success. Any suggestions would be incredibly useful. Thanks so much,
Bogdan