Hello,
I'm trying to make a plink2 file following advice from Converting VCF to PLINK .bed binary fileset to check for pedigree errors with KING: How do conversion tools make the PLINK .fam file, without asking for family relationships a priori?
so I run the command plink2 --vcf 56001801066929_WGZ.snp.vcf.gz --make-bed --out ex
and get the output:
PLINK v2.00a3.1LM 64-bit Intel (19 May 2022) www.cog-genomics.org/plink/2.0/
(C) 2005-2022 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to ex.log.
Options in effect:
--make-bed
--out ex
--vcf 56001801066929_WGZ.snp.vcf.gz
Start time: Wed Jul 20 10:22:04 2022
70358 MiB RAM detected; reserving 35179 MiB for main workspace.
Using up to 10 threads (change this with --threads).
--vcf: 3499678 variants scanned.
--vcf: ex-temporary.pgen + ex-temporary.pvar.zst + ex-temporary.psam written.
1 sample (0 females, 0 males, 1 ambiguous; 1 founder) loaded from
ex-temporary.psam.
3499678 variants loaded from ex-temporary.pvar.zst.
Note: No phenotype data present.
Writing ex.fam ... done.
Writing ex.bim ...
Error: ex.bim cannot contain multiallelic variants.
End time: Wed Jul 20 10:22:07 2022
which only creates ex.log
and ex.fam
, so there is no .bed
output. I can make the file with plink1.9
, but I'm running into other bugs with that one "split chromosome" after sorting with bcftools which is the point of trying plink2.
How can I make a bed file with plink2?
The error message you're getting from plink2 is that you have multiallelic SNPs in your VCF. Have you tried filtering your VCF to only include biallelic SNPs?
See this discussion for one way to do that:
how to remove multiallelic from VCF
Use bcftools norm (https://samtools.github.io/bcftools/bcftools.html#norm) to split multi-allelic sites to multiple biallelic sites if retaining all the called alleles are crucial for your work. Then use plink2 to make bed with --set-all-var-ids @:# option to name all of your variants so as to distinguish multiple alleles at the same position