How does GATK VariantFiltration work on multi-sample vcf files?
VariantFiltration is used to annotate likely false positive SNP's based on certain formula's:
--filterExpression "MQ0 >= 4 && ((MQ0 / (1.0 * DP)) > 0.1)" --filterName "HARD_TO_VALIDATE"
--filterExpression "DP < 5 " --filterName "LowCoverage"
--filterExpression "QD < 1.5 " --filterName "LowQD"
--filterExpression "SB > -10.0 " --filterName "StrandBias"
--filterExpression "QUAL > 30.0 && QUAL < 50.0 " --filterName "LowQual"
--clusterWindowSize 10
It is easy enough to understand how this works on single sample VCF files, but how does this work on multi-sample vcf files?
For example the low coverage filter, will it annotate the SNP low coverage if
a) all of the samples combined in total have less than 5 reads for the SNP
b) each of the samples has less than 5 reads
c) one or more of the samples has less than 5 reads.
The same for the MQ0, QD and SB annotation. Are they set when:
a) all of the samples combined reach the threshold
b) each sample it self reaches the threshold
c) or one or more of the samples reach the threshold
The lowqual and snpcluster annotation are set I guess based on all samples combined.
Thanks for providing this tool. I modified script file filter.js to filter based on GQ but it's giving error. javax.script.ScriptException: sun.org.mozilla.javascript.internal.EcmaError: ReferenceError: "GQ" is not defined. (<unknown source="">#16) in <unknown source=""> at line number 16