My friend is using plink 1.9, this is a simplified version of what I am reading from someones elses' code
He first reads in a vcf, split pseudoatuosomal regions
plink \
--vcf file.vcf \
--double-id \
--keep-allele-order \
--make-bed \
--split-x hg19 \
--out outname \
zeroes heterozygous haploid calls
plink
--bfile outname \
--keep-allele-order \
--make-bed \
--set-hh-missing \
--out outname_hh_fixed
Merges the PAR back in
plink \
--bfile outname_hh_fixed \
--merge-x \
--make-bed \
--out outname_hh_fixed_merged
Converts the PLINK to VCF
plink \
--bfile outname_hh_fixed_merged \
--keep-allele-order \
--recode vcf-iid \
--real-ref-alleles \
--a2-allele vcfinfo.txt 4 3 0 \
--out final_vcf.vcf
If I understand --keep-allele-order
prevents plink from flipping around alleles and preserves the REF allele in the original VCF as A2 (major allele in plink)
I have two questions:
- Why is
--keep-allele-order
not needed when merging PAR back in? - Why does PLINK need you specify
--keep-allele-order
and--a2-allele
when writing the VCF in the last step? I think I understand from the documentation that column 4 of vcfinfo.txt will be used as a guide for setting alleles in that column to A2 (or REF in the final_vcf). I am just kind of stumped as to why you would need to supply a column of alleles to designate as A2 if we are already saying--keep-allele-order
. Wouldn't the correct order already be baked into the .bim file then? Thank you.
Thank you for the response. The goal is to preserve allele order throughout this pipeline. Is there a general reason behind why someone would want to let plink assign major/minor alleles when using merge-x to merge PAR back in? My thought here is that he was really trying to make sure that allele order was kept at the lest step and just let it go at the merging step.