How To Extract Random Snps From Whole Genome Data?
3
2
Entering edit mode
12.9 years ago
User 1793 ▴ 40

Hi All,

I have a dataset containing more than 500K SNPs. Now I need to extract randome 15K SNPs from that. Please help me to do so.

Thanks

snp extraction • 15k views
ADD COMMENT
0
Entering edit mode

How do the answers know that you have bfile (BED) as input?

ADD REPLY
5
Entering edit mode
12.9 years ago

Transform your file in PED format

plink --bfile file1 --recode --out file2

Extract snps column

cut -f 2 file2.map > snps.map

Choose 15k SNPs

shuf -n 15000 snps.map > snps.subset.map

Extract those SNPs from your first file

plink --bfile file1 --extract snps.subset.map --make-bed --out file3
ADD COMMENT
1
Entering edit mode

extra step of making a plain text PED file. And not all unix systems have shuf installed. sort -R on the BIM file is all you need.

ADD REPLY
0
Entering edit mode

perfect! Thanks a lot.

ADD REPLY
4
Entering edit mode
12.9 years ago
Pablo ★ 1.9k

I think the unix command shuf does the trick (assuming the SNPs are one per line in a text/VCF file)

shuf -n 15000 snps_file.vcf
ADD COMMENT
0
Entering edit mode

thanks, I have the file in .bed, .bim, .fam format!

ADD REPLY
0
Entering edit mode

use PLINK to create a VCF file and follow Pablo's suggestion. Or use the PLINK R interface to do the same.

ADD REPLY
0
Entering edit mode
plink --bfile file1 --recode --out file2
cut -f 2 file2.map > snps.map
shuf -n 15000 snps.map > snps.subset.map
plink --bfile file1 --extract snps.subset.map --make-bed --out file3
ADD REPLY
0
Entering edit mode

the command shuf is not found when I try to run this on terminal in OSX.

ADD REPLY
4
Entering edit mode
12.9 years ago
Caddymob ★ 1.0k

you can also just use the UNIX sort to randomly grab lines out of your BIM file...

sort -R yourdata.bim | head -15000 | awk '{print$2}' > random15k.snps
plink --file yourdata --extract random15k.snps --make-bed --out random15k

this avoids the time and disk space to convert your file to a plain-text PED file and keeps it all binary for speed and disk friendliness =)

ADD COMMENT
0
Entering edit mode

thanks. this also worked perfectly.

ADD REPLY

Login before adding your answer.

Traffic: 2454 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6