I have a multisample VCF and I want to filter it based on:
- The sequencing depth from the first sample (which is a germline sample) and
- The genotype of the first (and/or the third) samples (which are both germline samples)
I've been investigating gatk VariantFiltration
for doing this, using -filter='vc.getGenotype("SAMPLE").getDP>=<N>' -filter-name="germline.depth"
for the first sample sequencing depth, but I can't find a suitable expression involving the sample genotype that works for the second condition.
Ideally, an expression similar to this: -filter='<something>' -filter-name='something.else'
would be used, as it puts the filter into the FILTER column of the VCF, and the unfiltered variants can be selected using gatk SelectVariants
to pick those that haven't been marked as filtered.
Is there an expression like the depth one for the individual sample genotypes?
no tested, testing sample2 is HET
Thanks! A quick test has shown that variations of this (see the answer) are what I required. The GATK help pages don't seem that helpful.