Entering edit mode
4.8 years ago
tacrolimus
▴
140
Dear Biostars,
I am trying to filter a multi-sample vcf which has been annotated with VEP in order to get a set of rare likely deleterious calls. However, the file is very large and using "filter_vep" is taking a very long time per file (>5 days per chromosome on a HPC environment). I have been told that the bcftools add-on:split-vep performs better for this and I was wondering how queries using this would look as I have been struggling.
For example:
filter_vep -i my.vcf -o my_filtered.vep --filter "(MAX_AF is < 0.01 or not MAX_AF) and (CADD_PHRED gte 20 or not CADD_PHRED )"
Could one reproduce this using split-vep - I would want to output the entire vcf line (ideally with the header) so that it remains a vcf file?
Many thanks!
Hey omid.alavijeh ,
could you please show the header of the vcf file and the first few variants?
fin swimmer
Hi @finswimmer,
I work in an airlock environment so can't bring data out but it looks like this (taken from another site but basically the same).