Getting a sample of 1000 random SNPs with MAF
1
2
Entering edit mode
7.0 years ago
puchkovaan ▴ 20

I have to make a list of 1000 human snps, all with 1000 genome MAF more than 5%. The problem is, they should be really random, and not come from any particular chromosome or gene group. Any idea how to fetch such a set apart from getting a random number generator and entering them one by one?

snp retrieval database • 2.8k views
ADD COMMENT
0
Entering edit mode

There are two tags in dbSNP :G5 and G5A. G5 is >5% in any one or more populations and G5A is >5% in all the populations. When you mention MAF > 5% in 1000 genomes, is it any one or more population (G5 in dbSNP) or in all populations (G5A)? In first case, filter dbSNP by G5 tag and then sample the vcf records. If you are still in doubt, filter by both KG and G5/G5A tags.

Example code:

$   bcftools view -i 'G5=1 && KGPROD=1'  dbsnp.138.chr20.vcf  | vcfrandomsample -r 0.01

Similar post here on biostars:Picking random SNPs from 1000 Genomes using Vcftools. VCFlib has vcfrandomsample option. It samples by % ( note: calculate percentage of your records to get 1000 variants). You need to use dbSNP vcf for all the chromosomes instead of 20 above. Alex has code for random sampling vcf: https://github.com/alexpreynolds/sample.

ADD REPLY
2
Entering edit mode
7.0 years ago

using vcffilterjdk follow by awk | sort:

$ curl  "ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.wgs.phase3_shapeit2_mvncall_integrated_v5b.20130502.sites.vcf.gz" |\
gunzip  -c  |\
java -jar dist/vcffilterjdk.jar -e 'return variant.getAttributeAsList("AF").stream().mapToDouble(S->Double.parseDouble(S.toString())).anyMatch(V->V>0.05);' |\
grep -v '#' |\
awk '{printf("%f\t%s\n",rand(),$0);}' | \
sort -t $'\t' -k1,1g |\
head -n 1000
ADD COMMENT
0
Entering edit mode

Hi Pierre, What's the best way to sample a SNPs list which have same major allele frequency in 1000 Genome with my own SNP-set ? Thanks.

ADD REPLY
0
Entering edit mode

Hi Shicheng,

I also need to make a set of random SNP with the similar allele frequency with my SNP list. Could you please let me know how did you solve the problem?

ADD REPLY

Login before adding your answer.

Traffic: 1591 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6