Obtaining the AF and DP values for variants in a VCF
1
1
Entering edit mode
18 months ago
bt_cepo ▴ 40

Hi all!

I have produced several vcf files and I would like to obtain the values for AF and DP to plot their distribution and have a description of my data.

I have reviewed past questions and documentation but their solutions do not seem to work for my output since my values are not present in the INFO column but the sample one. For DP, I have the values both in the INFO and the sample columns.

This is the format of the variants I have, generated by using mutect2:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  SAMPLE_3
1       11272969        .       A      C       .       .       AS_SB_TABLE=53,110|1,9;DP=173;ECNT=1;MBQ=32,33;MFRL=158,158;MMQ=60,60;MPOS=57;POPAF=7.3;RPA=4,3;RU=A;STR;TLOD=14.73     GT:AD:AF:DP:F1R2:F2R1:FAD:SB    0/1:163,10:0.061:173:147,8:0,0:162,10:53,110,1,9

How can I retrieve these values? I cannot find a way if the info is not inside the INFO column but there must be a solution if anyone works with this vcf format.

Thanks!

DP mutect2 SNP vcf AF • 2.1k views
ADD COMMENT
0
Entering edit mode

I have reviewed past questions and documentation

Have you looked at the examples in the bcftools query documentation?

ADD REPLY
0
Entering edit mode

Yes, but they do not seem to address the different values included in the FORMAT section. In fact, when I try to run this to display all the FORMAT values:

$ bcftools query -f "%FORMAT" SAMPLE_3.vcf

I receive:

Error: no such tag defined in the VCF header: INFO/FORMAT. FORMAT fields must be in square brackets, e.g. "[ FORMAT]"

But of course my vcf header does not include FORMAT like that because it is not part of INFO.

This happens also with %FORMAT/AF, I get the same error prompt.

ADD REPLY
0
Entering edit mode

Read the error message properly: FORMAT fields must be in square brackets, e.g. "[ FORMAT]". If you read the examples there, you won't be asking this question again, especially after the exact section has been pointed out to you.

ADD REPLY
0
Entering edit mode

You are completely right! I completely misunderstood the message and the examples.

Thank you so much for taking the time to reply twice even though you already had given the answer, really appreciate it.

ADD REPLY
0
Entering edit mode

No worries, I'm glad you took this as an opportunity to learn - many people lash out at not being given the answer right away, but doing that would take away the pleasure of discovering the exact answer yourself.

Please add the exact solution as an answer and accept it for the benefit of future users.

ADD REPLY
2
Entering edit mode
18 months ago
bt_cepo ▴ 40

So I was able to retrieve this information using bcftools query. As stated in the bcftools documentation, the FORMAT fields can be accessed if they are declared in square brackets.

I stored the values for AF and DP in a csv file (stats.csv) using this command:

bcftools query -f'[%CHROM,%POS,%AF,%DP\n]' SAMPLE_3.vcf > stats.csv

The output looks something like this, it shows the chromosome, position, AF and DP for each variant:

1,11272969,0.061,173
2,25466715,0.091,186
3,10183534,0.052,119
3,37053609,0.14,164
3,37090052,0.055,180
ADD COMMENT
0
Entering edit mode

Looking in the documentation is precisely what my brother Ram had suggested.

ADD REPLY
1
Entering edit mode

Yeah, OP followed my suggestions wholeheartedly and arrived at this solution that they then shared with us.

ADD REPLY

Login before adding your answer.

Traffic: 1137 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6