Question

Allele coding in BGENIE GWAS output

0

Entering edit mode

4.8 years ago

gokberk ▴ 90

Hi all, I have a quick question about BGENIE GWAS summary stats. In the summary statistics, alleles are coded as a_0 and a_1, looks like as the following:

chr rsid pos a_0 a_1 af info pheno1_beta pheno1_se pheno1_t ...
22 22:16050075:A:G 16050075 A G 0.0001 1 0.00067749 0.01008 0.067215 ...
22 22:16050115:G:A 16050115 G A 0.00545 1 -0.00022679 0.010577 -0.021441 ...
22 22:16050213:C:T 16050213 C T 0.00635 1 -0.0053945 0.010732 -0.50266 ...
22 22:16050319:C:T 16050319 C T 0.00115 1 -0.0072811 0.010548 -0.69025 ...
22 22:16050527:C:A 16050527 C A 0.00045 1 -0.010907 0.011428 -0.95444 ...
22 22:16050568:C:A 16050568 C A 0.00025 1 -0.0024885 0.011269 -0.22083 ...
22 22:16050607:G:A 16050607 G A 0.0006 1 0.013246 0.010527 1.2583 ...
22 22:16050627:G:T 16050627 G T 0.0004 1 -0.00043928 0.01008 -0.04358 ...
...

In their manual, they say the following about the allele coding:

In the regression model we code the first and second alleles as 0 and 1 respectively, so the beta coefficient refers to the effect of having an extra copy of the second allele.

So (just to be sure that there is not a random A1<->A2 swap in the summary stats format), I'd like to ask which allele (a_0 or a_1) is the reference (A1) and which one is the derived/effect (A2) allele in this context.

Cheers!

gwas bgenie summary statistics • 1.5k views

ADD COMMENT • link updated 3.5 years ago by Al Murphy ▴ 40 • written 4.8 years ago by gokberk ▴ 90

score 2 · Accepted Answer · 2021-11-22

I am not too familiar with BGENIE's summary statistics formats but from the citation you took from their manual, it is telling you that the second allele listed (a_1) is the effect allele (A2). However if you want to know more about reference and alternative alleles and want a tool to standardise any GWAS sumstats file as well as testing there is no incorrect direction (a_0 and a_1 value flipped) along with a whole host of other checks, check out MungeSumstats R Bioconductor package (I wrote this package). You can pass the GWAS sumstats to MungeSumstats with 1 call and can correct any flipped alleles, ensure all alleles are on the reference genome, infer and liftover the reference genome between hg19 and hg38 etc...