VEP: filter_vep, custom VCF annotation (kaviar)
1
1
Entering edit mode
5.2 years ago
c.clarido ▴ 110

Hello biostars community,

I have recently annotated a multi-sample VCF using VEP standalone with plugins and custom annotations. For custom annotations, I used kaviar, goNl and gnomAD.

After annotation, I wanted to filter rare damaging missense variants. At this moment I have the following filter:

filter_vep \
    --force_overwrite \
    --input_file $1 \
    --output_file $2 \
    --only_matched \
    --filter "gnomADG_AF_NFE < 0.01 and kaviar_AF < 0.01 and goNl_AF <  0.01" \
    --filter "Polyphen2_HDIV_pred match D and SIFT4G_pred match D" \
    --filter "CADD_phred > 20"  \
    --filter "phastCons100way_vertebrate >= 0.3 and phyloP100way_vertebrate >= 0"

For example I have a variant with the following annotation:

chr1    1500144 rs186758134     C       T       2077.27 .       AC=10;AF=0.179;AN=56;BaseQRankSum=3.76;ClippingRankSum=0;DB;DP=446;ExcessHet=2.2123;FS=4.535;InbreedingCoeff=0.0252;MLEAC=10;MLEAF=0.179;MQ=58.52;MQRankSum=1.14;QD=13.15;ReadPosRankSum=1.58;SOR=0.185;VariantCaller=Intersection;CSQ=T|downstream_gene_variant|MODIFIER|ATAD3B|ENSG00000160072|Transcript|ENST00000308647|protein_coding|||||||||||3942|1||HGNC|HGNC:24007|||||||||||||rs186758134|0.016|1:1500143-1500145&rs186758134|0.0010615&0.0162256|rs186758134|7.74330e-02,T|downstream_gene_variant|MODIFIER|ATAD3B|ENSG00000160072|Transcript|ENST00000472194|retained_intron|||||||||||2296|1||HGNC|HGNC:24007|||||||||||||rs186758134|0.016|1:1500143-1500145&rs186758134|0.0010615&0.0162256|rs186758134|7.74330e-02,T|downstream_gene_variant|MODIFIER|ATAD3B|ENSG00000160072|Transcript|ENST00000474481|retained_intron|||||||||||3943|1||HGNC|HGNC:24007|||||||||||||rs186758134|0.016|1:1500143-1500145&rs186758134|0.0010615&0.0162256|rs186758134|7.74330e-02,T|downstream_gene_variant|MODIFIER|ATAD3B|ENSG00000160072|Transcript|ENST00000485748|retained_intron|||||||||||3942|1||HGNC|HGNC:24007|||||||||||||rs186758134|0.016|1:1500143-1500145&rs186758134|0.0010615&0.0162256|rs186758134|7.74330e-02,T|upstream_gene_variant|MODIFIER|AL645728.2|ENSG00000284740|Transcript|ENST00000641679|transcribed_unprocessed_pseudogene|||||||||||3438|1||Clone_based_ensembl_gene||||||||||||||rs186758134|0.016|1:1500143-1500145&rs186758134|0.0010615&0.0162256|rs186758134|7.74330e-02,T|upstream_gene_variant|MODIFIER|AL645728.2|ENSG00000284740|Transcript|ENST00000641974|processed_transcript|||||||||||3106|1||Clone_based_ensembl_gene||||||||||||||rs186758134|0.016|1:1500143-1500145&rs186758134|0.0010615&0.0162256|rs186758134|7.74330e-02

For clarity, there are multiple CSQ fields:

CSQ=T|downstream_gene_variant|MODIFIER|ATAD3B|ENSG00000160072|Transcript|ENST00000308647|protein_coding|||||||||||3942|1||HGNC|HGNC:24007|||||||||||||rs186758134|0.016|1:1500143-1500145&rs186758134|0.0010615&0.0162256|rs186758134|7.74330e-02

T|downstream_gene_variant|MODIFIER|ATAD3B|ENSG00000160072|Transcript|ENST00000472194|retained_intron|||||||||||2296|1||HGNC|HGNC:24007|||||||||||||rs186758134|0.016|1:1500143-1500145&rs186758134|0.0010615&0.0162256|rs186758134|7.74330e-02

T|downstream_gene_variant|MODIFIER|ATAD3B|ENSG00000160072|Transcript|ENST00000474481|retained_intron|||||||||||3943|1||HGNC|HGNC:24007|||||||||||||rs186758134|0.016|1:1500143-1500145&rs186758134|0.0010615&0.0162256|rs186758134|7.74330e-02

T|downstream_gene_variant|MODIFIER|ATAD3B|ENSG00000160072|Transcript|ENST00000485748|retained_intron|||||||||||3942|1||HGNC|HGNC:24007|||||||||||||rs186758134|0.016|1:1500143-1500145&rs186758134|0.0010615&0.0162256|rs186758134|7.74330e-02

T|upstream_gene_variant|MODIFIER|AL645728.2|ENSG00000284740|Transcript|ENST00000641679|transcribed_unprocessed_pseudogene|||||||||||3438|1||Clone_based_ensembl_gene||||||||||||||rs186758134|0.016|1:1500143-1500145&rs186758134|0.0010615&0.0162256|rs186758134|7.74330e-02

T|upstream_gene_variant|MODIFIER|AL645728.2|ENSG00000284740|Transcript|ENST00000641974|processed_transcript|||||||||||3106|1||Clone_based_ensembl_gene||||||||||||||rs186758134|0.016|1:1500143-1500145&rs186758134|0.0010615&0.0162256|rs186758134|7.74330e-02

Take kaviar AF values for example: 0.0010615&0.0162256

I tried running filter_vep however I get this kind messages in the stdout:

Argument "0.0001896&0.8633710&0.0302146" isn't numeric in numeric lt (<) at cedrick/conda/envs/ensembl-vep/share/ensembl-vep-98.0-0/modules/Bio/EnsEMBL/VEP/FilterSet.pm line 679, <GEN0> line 2305.

How does filter_vep go about processing this? Does it skip it?

vep kaviar filter_vep • 2.0k views
ADD COMMENT
2
Entering edit mode
5.2 years ago
Emily 24k

The filter script will just skip the variant and it won't end up in your filtered data, even if it ought to pass the filter. Your best option is to clean up your data to ensure that it's in a format that the filters can manage.

ADD COMMENT
0
Entering edit mode

Hello Emily,

Thank you for the quick response. I used the --custom as follow: --custom ~{kaViarPath},kaviar,vcf,exact,0,AF wouldn't it have matched only the exact coordinates?

ADD REPLY
0
Entering edit mode

The custom VCFs match by rsID, not by coordinates, but these should be distinct.

You should check what's in the kaviar VCF. If you're getting these fields with "&" in them, is that what's in kaviar?

ADD REPLY
0
Entering edit mode

Hello, actually kaviar had variants with multiple allele

ADD REPLY

Login before adding your answer.

Traffic: 2000 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6