Ram has already pointed out the solution. this is just to expand that answer a little bit and to provide you with a few examples since you are interested in the particular commands needed.
after a GATK's filtering process, the FILTER column gets filled with labels indicating whether the variant fulfilled all the requirements (the label would be PASS) or not (the label would be any other). removing variants that didn't reach the hard filters' thresholds is the same as filtering PASS only variants, so you can use a generic tool for parsing text (grep
, sed
, awk
,...) or a tool that deals with vcf files natively (vcftools
or bcftools
for instance).
an example of the first ones would be the following:
grep ^
grep PASS file.vcf >> file.filtered.vcf
an example of the second ones would be the following:
bcftools view -Oz -f .,PASS file.vcf.gz > file.filtered.vcf.gz
note that bcftools
requires the vcf file to be previously bgzip
compressed and tabix
indexed. if filtering by PASS label is all you need you may probably prefer to use the simple yet fast text parsing options, but have in mind that bcftools
is very fast too (faster than vcftools
indeed) plus it allows to build your requirements for the filtered file very easily, even if those requirements are complex.
Very basic programming skills may make a huge difference. Learn programming.
A better approach would be "What logic/process/algorithm would be ideal here?"
yes again you are correct; thanks!