Hi All, I'm trying to convert VCF obtained from (Michigan imputed server) to bfile (plink file format) for further operation. When I look the VCF file I found a very long variant name of the REF/ALT. I want to keep the limit to 100 variant long SNPs.
I have tried all plink following command but it is not working
--set-missing-var-ids @:#[b37]
--set-missing-var-ids @:#[b37]\$1,\$2
--new-id-max-allele-len 50
--snps-only
--biallelic-only
At the end, I'm getting the following error: Error: Variant names are limited to 16000 characters.
We have imputed data sets from the Michigan Imputation server (MIS) and when using plink2 throwing the following error "Error: Header line 11 of --vcf file does not have expected FORMAT:GT format."
We want to convert imputed data to ped files.
Hi Can you suggest how to set ",", for longer alleles in VCF files? I used your suggested syntax in plink2 but still got a similar error "Variant has longer than 16000 characters". I used the following syntax in plink2. suggested if I'm doing anything wrong. I would appreciate your time and afford.
You omitted a crucial piece of information: what's the input?
If it's a VCF with an overly long variant ID, you have to clear that ID first with e.g. a bash one-liner, because VCF import happens before --set-missing-var-ids in the order of operations.