I have a VCF that I am reading into PLINK.
This VCF has many variants that are redundant in position (same chrom + base pair position).
I read this VCF into PLINK to do some data manipulations, then convert PLINK to VCF using the internal PLINK functionality.
I am noticing that the order of variants in the VCF produced by PLINK that have redundant positions do not always write out in an order that reflects the original VCF (im talking about the actual variants not the flipping of alleles).
i.e.
The order in the original VCF:
variant A (redundant position with variant B)
variant B (redundant position with variant A)
might write out form PLINK in the reverse order:
variant B (redundant position with variant A)
variant A (redundant position with variant B)
All the variants with unique positions seem fine.
Is there a way to force a variant order for these redundant variants by using a reference file or is there a way to have PLINK output a file that lists varaints in the exact order that they will be written to a VCF? I thought about using the bim file as a reference for the order that PLINK will use, but I am not sure if that is accurate. Thank you.
But this is for major minor order of a single variant correct? I am looking for the order of the variants themselves (ie order of rsids)
Yeah, I misread your question.
One way to enforce a specific ID order is:
Create a temporary plink fileset with one sample (give it a new sample ID) and the desired variant ID order. All genotypes can be missing.
plink --bfile temp --bmerge <real fileset> --out merged
plink --bfile merged --remove temp.fam --make-bed sorted
This should work because —bmerge uses the variant ID order in the base fileset when possible.
Thank you so much and PLINK is amazing!