VCF format query
1
0
Entering edit mode
8.8 years ago
AW ▴ 350

Hi all,

I have a couple of questions about the vcf format.

I want to only call SNPs in individuals where the minor allele frequency is >75%.

GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR
0/1:6:5:5:3:2:40%:2.2222E-1:34:36:0:3:0:2

What is the difference between FREQ (40%) or AD/(RD+AD) (2/3)? How is FREQ calculated?

Thanks

vcf • 3.0k views
ADD COMMENT
0
Entering edit mode
8.8 years ago

First, please have a look at the VCF header, e.g., using

bcftools view --header-only $vcf_filename

According to the VCF specifications, the header (every line beginning with a ##) should contain meta-information provided on every INFO or FORMAT field and therefore also how FREQ is calculated. If this does not help, please look at the documentation of the utilized caller or at least provide the name of the caller here.

ADD COMMENT
0
Entering edit mode

Thanks. Its not much help though?

##FORMAT=<ID=RD,Number=1,Type=Integer,Description="Depth of reference-supporting bases (reads1)">
##FORMAT=<ID=AD,Number=1,Type=Integer,Description="Depth of variant-supporting bases (reads2)">
##FORMAT=<ID=FREQ,Number=1,Type=String,Description="Variant allele frequency">

I made the vcf file with Varscan2

ADD REPLY
0
Entering edit mode

Well, let me google that for you and have a look into the VarScan user manual (Google hit no. 3):

FREQ = frequency of variant allele by read count = AD / DP

In fact, this might be different from

AD/(RD+AD) = 2/(3+2)

(notice that you already wrote wrongly 2/3 in your question anyway) when multiple alleles are found. For example, consider reference nucleotide A with coverage 3 and alternative alleles G and C with coverage 2 and 5, respectively. Now, look at a call for the most abundant variant G:

FREQ = AD/DP = 5/(3+2+5) = 0.5 != 5/8 = AD / (RD+AD)
ADD REPLY

Login before adding your answer.

Traffic: 2652 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6