Hello,
I am hoping somebody with experience with plink could help. I am trying to generate plink .bim, .fam and .bed files from a .vcf (one with variants filtered out and one that keeps the variants) and have toyed around with a couple of different commands that I found on biostars posts and google.
The documentation of going from .vcf to plink files is a bit more sparse so I'd like to check with more experienced researchers here if I am proceeding correctly.
My outcomes have fallen into two camps. For .fam files, a file was generated with an --allow-extra-chr flag at the end. For both the .bim and .bed files, I get an error:
Error: out.hg38NoVariants-temporary.pvar.zst has a split chromosome. Use
--make-pgen + --sort-vars to remedy this.
Below are the commands I am trying and the output/errors I am receiving. I would be very appreciative if somebody could tell me if my .fam files are correct and what to do to successfully generate all files including how exactly to use "--make-pgen" and "--sort-vars".
Are these producing the correct .fam files?
./plink2 --vcf out.hg38KeepVariants.vcf --make-just-fam --out out.hg38KeepVariants --allow-extra-chr
Writing out.hg38KeepVariants.fam ... done.
My .bed command asks to add a --allow-extra-chr flag but after adding the flag, there is an error:
./plink2 --vcf out.hg38NoVariants.vcf --make-bed --out out.hg38NoVariants
Error: Invalid chromosome code '15_KI270727v1_random' on line 382274 of --vcf
file.
(Use --allow-extra-chr to force it to be accepted.)
.... now with added --allow-extra-chr flag.
./plink2 --vcf out.hg38NoVariants.vcf --make-bed --out out.hg38NoVariants --allow-extra-chr
Error: out.hg38NoVariants-temporary.pvar.zst has a split chromosome. Use
--make-pgen + --sort-vars to remedy this.
...with or without a flag, generating a .bim file causes a problem.
./plink2 --vcf out.hg38NoVariants.vcf --make-just-bim --out out.hg38NoVariants
Error: out.hg38NoVariants.vcf has a split chromosome. Use --make-pgen +
--sort-vars to remedy this.
I've preprocessed data before but never SNP data. Again, if anybody has experience with this pipeline, I'd appreciate your help. Thank you.
Thank you. I added --allow-extra-chr and a .fam file was made. Is that output correct? ./plink2 --vcf out.hg38KeepVariants.vcf --make-just-fam --out out.hg38KeepVariants --allow-extra-chr
I then used your advice and used the command:
./plink2 --vcf out.hg38NoVariants.vcf --make-pgen --out out.hg38NoVariants --allow-extra-chr --sort-vars
...which generated a .pgen file.
I then used: ./plink2 --pfile out.hg38NoVariants --make-just-bim --out out.hg38NoVariants --allow-extra-chr
....and
./plink2 --pfile out.hg38KeepVariants --make-bed --out out.hg38KeepVariants --allow-extra-chr
...which both ran through without error. Are they correct though?