plink --vcf chrX.M.vcf.gz --set-hh-missing --output-chr M --recode vcf --out chrX.M.set_hh_missing
but the output is distorted in that the sample IDs are doubled, for example
None of the options seem like they would do this, after reading the instructions. Why is this happening? How can I prevent the sample IDs from doubling like this?
Add "--const-fid 0 --keep-allele-order", and replace "--recode vcf" with "--recode vcf-iid" in your command line.
Use plink 2.0 when working with VCFs. In addition to natively supporting single-part sample IDs, it preserves VCF header lines and QUAL/FILTER/INFO columns, does not automatically swap REF/ALT alleles on you (this is what --keep-allele-order in the first workaround counteracts), and can handle multiallelic variants, phase, and dosage data.
Also note that you can add 'bgz' to --recode ("--recode vcf-iid bgz") to request bgzipping of the VCF.