Right now I am doing this:
get A1 A2 frequency
plink --bfile {my_bfile} --freqx --out {my_bfile}
Originally I was just going to see which alleles have higher allele count, but then I see in the plink docs ("For alleles that have exactly 0.50 minor allele frequency...then which allele is labelled as minor will depend on which was first encountered in the PED file.")
I am really hoping I dont have to extract those variants, convert to ped, then figure out which came first. Is there a better way?
I don't want to set --a1-allele / --a2-allele flags because I don't know if that information will be lost in downstream applications, I want to know what major and minor alleles are with the default behavior of plink
Yes, but I think the problem is what to do when MAF = 0.5
In the case of MAF = 0.5, I think A1 is still the minor allele. You can check the discussion in this link.
According to the link:
"When generating such filesets, PLINK 1.x defaults to swapping the alleles whenever A1's frequency is above (not equal to) 0.5"
Makes sense, but seems to run a little counter to what is in the docs, but maybe the docs are out of date