Picking random SNPs from 1000 Genomes using Vcftools
3
1
Entering edit mode
9.7 years ago
pifferdavide ▴ 110

I need to pick random sets of SNPs using Vcftools from 1000 Genomes variant set files. Is there a command to do this?

vcftools 1000 genomes snp SNP • 9.7k views
ADD COMMENT
1
Entering edit mode

What kind of output are you looking for? A smaller vcf with random lines from 1000 Genomes vcfs, or just a list of SNPs (rs ids, or list of chr,position,ref,alt)?

ADD REPLY
0
Entering edit mode

A list of SNPs (rs ids)

ADD REPLY
1
Entering edit mode

>-(

ADD REPLY
2
Entering edit mode
9.7 years ago

To sample without replacement with sample:

$ N=1234
$ sample --sample-size=${N} foo.vcf > sample.${N}.vcf
ADD COMMENT
0
Entering edit mode

It looks like sample is not a Vcftools command

ADD REPLY
0
Entering edit mode

Alex clearly pointed to a tool that is not vcftools.

ADD REPLY
0
Entering edit mode

Reread my question "Picking random SNPs from 1000 Genomes using Vcftools". Wrong answer since I asked how to do that job using vcftools!

ADD REPLY
4
Entering edit mode

To paraphrase the great English philosopher Mick Jagger, "You can't always get what you want. But if you ask some time, then you might find, there's a different tool that will actually do what you want."

ADD REPLY
0
Entering edit mode

Sure. I tried to install your downsamplevcf, but there are too many previous steps. I installed jvarkit but it still won't work. The ant command isn't recognized. I suppose I'll have to install Apache Ant too? Sorry for these newbie questions...

ADD REPLY
0
Entering edit mode

My advice would be not to use this answer.

ADD REPLY
0
Entering edit mode

Sorry, which answer do you mean? please kindly let me know if you have any suggestions.

ADD REPLY
1
Entering edit mode
9.7 years ago

I wrote a simple tool to downsample vcf files: https://github.com/lindenb/jvarkit/wiki/DownSampleVcf

$ curl -skL "ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20130502/ALL.wgs.phase3_shapeit2_mvncall_integrated_v5a.20130502.sites.vcf.gz" |\
gunzip -c | java -jar downsamplevcf.jar -n 100 > out.vcf
ADD COMMENT
0
Entering edit mode

I installed jvarkit but it won't let me install downsamplevcf. I get the following error message. Curl command not found.

How do I install curl?

ADD REPLY
0
Entering edit mode

Awesome. But now it doesn't read the command "ant". What do I need to install next?

ADD REPLY
0
Entering edit mode

Hi Pierre,

Using downsamplevcf.jar, is there any possibility to get random SNP with similar LD and allele frequency to the our SNPs under study?

ADD REPLY
0
Entering edit mode
9.7 years ago
Adam ★ 1.0k

There's no simple way of doing this directly in vcftools (although using 'sample' seems a good suggestion). However, perhaps you could use the --thin command to achieve what you need?

ADD COMMENT
1
Entering edit mode

Never mind, I found SNPSNAP

ADD REPLY

Login before adding your answer.

Traffic: 1613 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6