I need to pick random sets of SNPs using Vcftools from 1000 Genomes variant set files. Is there a command to do this?
I need to pick random sets of SNPs using Vcftools from 1000 Genomes variant set files. Is there a command to do this?
To sample without replacement with sample:
$ N=1234
$ sample --sample-size=${N} foo.vcf > sample.${N}.vcf
I wrote a simple tool to downsample vcf files: https://github.com/lindenb/jvarkit/wiki/DownSampleVcf
$ curl -skL "ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20130502/ALL.wgs.phase3_shapeit2_mvncall_integrated_v5a.20130502.sites.vcf.gz" |\
gunzip -c | java -jar downsamplevcf.jar -n 100 > out.vcf
There's no simple way of doing this directly in vcftools (although using 'sample' seems a good suggestion). However, perhaps you could use the --thin command to achieve what you need?
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
What kind of output are you looking for? A smaller vcf with random lines from 1000 Genomes vcfs, or just a list of SNPs (rs ids, or list of chr,position,ref,alt)?
A list of SNPs (rs ids)
>-(