I am trying to figure out why the allele frequencies I get from Plink and vcftools are different from what I have on the vcf file. My vcf file was produced by GATK jointgenotyping and filtered using vcftools. I have 448 samples in there. Here is a line from the vcf
DS235882 421 . T A 10758.8 . AC=29;AF=0.186;AN=156;BaseQRankSum=0.394;DP=1264;ExcessHet=0.0000;FS=1.155;InbreedingCoeff=0.6596;MLEAC=32;MLEAF=0.205;MQ=59.92;MQRankSum=0.00;QD=33.41;ReadPosRankSum=0.335;SOR=0.617 GT:AD:DP:GQ:PGT:PID:PL:PS 0/0:22,0:22:60:.:.:0,60,900:. ./.:0,47:.:99:1|1:405_G_A:2166,147,0:405 ./
I computed the allele frequency using vcftools vcftools --vcf input.vcf --freq2 --out allele_freq --max-alleles 2
and this is what I got;
CHROM POS N_ALLELES N_CHR {FREQ}
DS235882 101 2 140 1 0
DS235882 117 2 148 1 0
DS235882 128 2 150 1 0
DS235882 206 2 160 1 0
I tried plink using plink --vcf input.vcf --freq --allow-extra-chr --out allele_freq
#CHROM ID REF ALT ALT_FREQS OBS_CT
DS235882 . T A 0 140
DS235882 . T A 0 148
DS235882 . A T 0 150
DS235882 . A G 0 160
vcftools output gives all 1's and plink gives all 0's. What did I do wrong? Any advice or tips? I even extracted the AF column on my vcf and it looks like this,
DS235882 101 0.012
DS235882 117 0.041
DS235882 128 0.042
DS235882 206 0.012
I want to find out why these analyses wouldn't work. Thanks in advance.
Hi,
did you figure that out? I have similar issues regarding allele counts. I obtained higher allele counts when samples were processed by GATK3.8 than GATK4.