Easy way to find out which allele is minor allele from bed file?
2
0
Entering edit mode
6 months ago
curious ▴ 810

Right now I am doing this:

get A1 A2 frequency

plink --bfile {my_bfile} --freqx --out {my_bfile}

Originally I was just going to see which alleles have higher allele count, but then I see in the plink docs ("For alleles that have exactly 0.50 minor allele frequency...then which allele is labelled as minor will depend on which was first encountered in the PED file.")

I am really hoping I dont have to extract those variants, convert to ped, then figure out which came first. Is there a better way?

I don't want to set --a1-allele / --a2-allele flags because I don't know if that information will be lost in downstream applications, I want to know what major and minor alleles are with the default behavior of plink

plink • 846 views
ADD COMMENT
0
Entering edit mode
6 months ago
bk11 ★ 3.0k

In Plink, A1 is usually a minor allele and A2 a major allele.

.frq (basic allele frequency report)
Produced by --freq.

A text file with a header line, and then one line per variant with the following six fields:

CHR Chromosome code
SNP Variant identifier
A1  Allele 1 (usually minor)
A2  Allele 2 (usually major)
MAF Allele 1 frequency
NCHROBS Number of allele observations
ADD COMMENT
0
Entering edit mode

Yes, but I think the problem is what to do when MAF = 0.5

ADD REPLY
0
Entering edit mode

In the case of MAF = 0.5, I think A1 is still the minor allele. You can check the discussion in this link.

ADD REPLY
0
Entering edit mode

According to the link:

"When generating such filesets, PLINK 1.x defaults to swapping the alleles whenever A1's frequency is above (not equal to) 0.5"

Makes sense, but seems to run a little counter to what is in the docs, but maybe the docs are out of date

ADD REPLY
0
Entering edit mode
6 months ago

This question does not make sense. If you care about allele-order "information [being] lost in downstream applications", REF/ALT is the obvious solution. Yes, it is necessary to be aware that REF is occasionally minor, but at least it is well-defined. Trying to be consistent about major/minor when some allele frequencies are exactly 0.5 is a good way to waste a large amount of time accomplishing next to nothing, and that's before we even talk about allele frequencies (and thus a few major/minor statuses) changing whenever a single sample is filtered out or added.

ADD COMMENT

Login before adding your answer.

Traffic: 1796 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6