Hi All,
I have a dataset containing more than 500K SNPs. Now I need to extract randome 15K SNPs from that. Please help me to do so.
Thanks
Hi All,
I have a dataset containing more than 500K SNPs. Now I need to extract randome 15K SNPs from that. Please help me to do so.
Thanks
plink --bfile file1 --recode --out file2
cut -f 2 file2.map > snps.map
shuf -n 15000 snps.map > snps.subset.map
plink --bfile file1 --extract snps.subset.map --make-bed --out file3
I think the unix command shuf
does the trick (assuming the SNPs are one per line in a text/VCF file)
shuf -n 15000 snps_file.vcf
you can also just use the UNIX sort to randomly grab lines out of your BIM file...
sort -R yourdata.bim | head -15000 | awk '{print$2}' > random15k.snps
plink --file yourdata --extract random15k.snps --make-bed --out random15k
this avoids the time and disk space to convert your file to a plain-text PED file and keeps it all binary for speed and disk friendliness =)
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
How do the answers know that you have bfile (BED) as input?