VCF tools --freq output file
1
0
Entering edit mode
4.4 years ago

Hello,

I am using vcftools --freq to obtain allele frequencies of my VCF file consisting of 4 individuals. I used the below command to run

vcftools --vcf A.vcf --freq --out A.frq

My output file looks like below:

CHROM   POS     N_ALLELES       N_CHR   {ALLELE:FREQ}
contig_1        279875  2       0       T:-nan  C:-nan
contig_3        277244  3       0       G:-nan  A:-nan  T:-nan
contig_3        277247  2       0       C:-nan  T:-nan
contig_4        8794    2       0       A:-nan  G:-nan
contig_4        78125   2       8       G:0     A:1
contig_4        219961  2       8       G:0     C:1
contig_4        250382  2       8       T:0     C:1
contig_11       123877  2       6       T:0.166667      C:0.833333

I was unable to find the description of the ouput headers for the .frq file. Which allele comes first, Major allele or Minor alelle? What does -nan signify?

Please let me know if can find this information anywhere.

Any help would be appreciated. Thank you so much!

allele-frequency VCFtools • 4.1k views
ADD COMMENT
0
Entering edit mode
4.4 years ago

Hey,

The header for the output in included in the output itself. For example:

CHROM           POS     N_ALLELES   N_CHR   {ALLELE:FREQ}
contig_11       123877  2           6       T:0.166667  C:0.833333

From this, I can see that there are 2 unique alleles (N_ALLELES) at position contig_11:123877, and these are observed across 6 total alleles (N_CHR) - these have the following frequencies:

  • T:0.166667
  • C:0.833333

My crude mathematics tell me that there is 1 T base, and 5 C bases.

Kevin

ADD COMMENT
0
Entering edit mode

Hi, Thanks for your reply. If I go by the calculation, can I confidently say that the first column is MAF for creating a MAF plot. What would I infer from a SNP having three alleles as seen in contig_3:277244. Do have any suggestions about it ? Thank u

ADD REPLY
1
Entering edit mode

For the multi-allelic site, you may want to remove those, or at least split them - see my answer here: A: Remove duplicate SNPs only based on SNP ID in bcftools

Irrespective of multi-alleles or not, I am not sure, given the history of these programs, that you can have 100% confidence that the first column always relates to the minor allele. I would implement a check via awk in order to detect the minor allele and then take that.

ADD REPLY
1
Entering edit mode

Thank you. I suppose I can use,

bcftools view --max-alleles 2

for retaining positions with only 2 alleles. Then, I will try awk and obtain just the minor allele frequency.

ADD REPLY

Login before adding your answer.

Traffic: 2133 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6