I have a collection of genotyping files, that, for the purpose of this question, I will call big.bed, big.ped, big.fam, big.map, big.vcf, etc. This dataset has information on ~1.8M SNPs and 877 samples.
I also have a list of ~1000 SNPs in a file wanted_snps.txt, one SNP per line.
I want to generate a collection of files tiny.bed, tiny.ped, tiny.fam, tiny.map, tiny.vcf consisting of the subsets of the data in the big.* files corresponding to the SNPs mentioned in wanted_snps.txt.
(In case it matters, we can safely assume that all the SNPs mentioned in wanted_snps.txt are represented in the big.* dataset.)
I understand that one can perform such subsetting using plink, but after poring over the online documentation, I still can't figure out how to do this.
Could someone show me what I commands I'd need to run to do this?
I am using plink version 1.9.
Thanks in advance!