How to filter mutant allele frequency in Variant Effect Predictor (VEP)?
2
0
Entering edit mode
7.5 years ago
kin182 ▴ 10

This is the first time I used Variant Effect Predictor (VEP) and would like to use it to annotate the VCF files I got from WES data. I tried to set up some filters to include only the mutations with mutant allele frequency higher than 0.2 (Number of mutations/Total number of counts > 0.2).

This is the code I used:

./vep --cache --offline --symbol --coding_only \
--freq_freq 0.2 --freq_gt_lt gt --freq_filter include \ 
-i input.vcf -o output.txt

I checked the results by loading the bam files on IGV. However, I found that so far almost all the mutations in the results had allele frequency < 0.2. For example:

Total counts: 118
A: 0
C: 0
G: 102 (86%, 86+, 16-)
T: 16 (14%, 16+, 0-)
N: 0

The G -> T mutation has only 0.14.

Does anyone have experience in using VEP? The way I used it may be incorrect and could you point out what I am missing here? Thank you.

perl vcf vep ensembl • 4.2k views
ADD COMMENT
4
Entering edit mode
7.4 years ago
EnsemblWill ▴ 570

VEP cannot do the filtering on the data as you have it.

Typically frequency data is encoded in the INFO field of the VCF file, and VEP's accompanying filter script would allow you to filer on such a field. However, looking at the snippet you have pasted, VEP is unable to filter on this as it is not a standard format. Indeed, I'd be surprised if there was any software that could do this out of the box, except perhaps whatever was used to generate this VCF.

If I were doing this task I'd write a short perl script to process the data and filter.

ADD COMMENT
2
Entering edit mode
7.5 years ago

AF is for filtering by Allele frequency: (copy/pasted from VEP manual here-- http://www.ensembl.org/info/docs/tools/vep/script/vep_filter.html)

Note that for numeric fields, such as the *AF allele frequency fields, filter_vep does not consider the absence of a value for that field as equivalent to a 0 value. For example, if you wish to find rare variants by finding those where the allele frequency is less than 1% or absent, you should use the following:

--filter "AF < 0.01 or not AF"

Please post few lines of VCF here (with or without headers), that are not getting filtered with your VCF pipeline.

ADD COMMENT
0
Entering edit mode

Here are a few lines of VCF:

1 69428 . T G . . . AD:DP:n.read.pos:n.read.pos.ref:raw.count:raw.count.ref:raw.count.total:mean.quality:count.plus:count.plus.ref:count.minus:count.minus.ref:read.pos.mean:read.pos.var:codon.dir 0,2:2:2:0:2:0:2:35.5:2:0:0:0:34.5:12.5:0 1 69511 . A G . . . AD:DP:n.read.pos:n.read.pos.ref:raw.count:raw.count.ref:raw.count.total:mean.quality:count.plus:count.plus.ref:count.minus:count.minus.ref:read.pos.mean:read.pos.var:codon.dir 0,2:2:2:0:2:0:2:37.5:0:0:2:0:40:2:0 1 183629 . G A . . . AD:DP:n.read.pos:n.read.pos.ref:raw.count:raw.count.ref:raw.count.total:mean.quality:mean.quality.ref:count.plus:count.plus.ref:count.minus:count.minus.ref:read.pos.mean:read.pos.mean.ref:read.pos.var:read.pos.var.ref:codon.dir 14,6:20:6:13:6:14:20:37.5:36.8571:6:13:0:1:32.1667:28.6429:527.506:431.971:0

I wanted to filter the mutant allele frequency based on the data that I have (in-house frequency) (Number of counts that has that mutation is divided by total number of counts in bam file). Not to filter the allele frequency based on the data on 1000 Genome. I wonder if VEP can allow me to do this?

ADD REPLY
0
Entering edit mode

VEP assumes standard VCF when filtering standard fields such as AF. Unless the source file has AF in standard format, it won't work.

ADD REPLY

Login before adding your answer.

Traffic: 1573 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6