Hello,
I am using vcftools --freq
to obtain allele frequencies of my VCF file consisting of 4 individuals. I used the below command to run
vcftools --vcf A.vcf --freq --out A.frq
My output file looks like below:
CHROM POS N_ALLELES N_CHR {ALLELE:FREQ}
contig_1 279875 2 0 T:-nan C:-nan
contig_3 277244 3 0 G:-nan A:-nan T:-nan
contig_3 277247 2 0 C:-nan T:-nan
contig_4 8794 2 0 A:-nan G:-nan
contig_4 78125 2 8 G:0 A:1
contig_4 219961 2 8 G:0 C:1
contig_4 250382 2 8 T:0 C:1
contig_11 123877 2 6 T:0.166667 C:0.833333
I was unable to find the description of the ouput headers for the .frq file. Which allele comes first, Major allele or Minor alelle? What does -nan
signify?
Please let me know if can find this information anywhere.
Any help would be appreciated. Thank you so much!
Hi, Thanks for your reply. If I go by the calculation, can I confidently say that the first column is MAF for creating a MAF plot. What would I infer from a SNP having three alleles as seen in contig_3:277244. Do have any suggestions about it ? Thank u
For the multi-allelic site, you may want to remove those, or at least split them - see my answer here: A: Remove duplicate SNPs only based on SNP ID in bcftools
Irrespective of multi-alleles or not, I am not sure, given the history of these programs, that you can have 100% confidence that the first column always relates to the minor allele. I would implement a check via
awk
in order to detect the minor allele and then take that.Thank you. I suppose I can use,
for retaining positions with only 2 alleles. Then, I will try awk and obtain just the minor allele frequency.