extracting snps from multiple vcf file but present in proportion of samples
1
0
Entering edit mode
5.7 years ago
ankit hinsu ▴ 10

Hi,

I have vcf files of 25 samples (all of them prepared using freebayes with same reference). I want to extract SNPs which are present in at least 80% of samples (i.e. present in any 20 samples). Kindly help me with it.

I have tried "bcftools isec". It gives me output of those SNPs which are present in at least 20 samples (what I want). But whichever sample was inputted first in file list will be used as a reference. Because of these, only SNPs which are present in my first sample along with any other 19 samples are outputted (This is what I don't want). It should output SNPs present in any 20 samples.

Hope I have explained my problem clearly.

Ankit.

vcf SNP variant • 1.6k views
ADD COMMENT
0
Entering edit mode
5.7 years ago

using vcffilterjdk : http://lindenb.github.io/jvarkit/VcfFilterJdk.html

java -jar jvarkit-git/dist/vcffilterjdk.jar -e 'return variant.getGenotypes().stream().filter(G->!(G.isNoCall() || G.isHomRef())).count()>=20;' input.vcf
ADD COMMENT
0
Entering edit mode

Thanks for reply...

I am guessing I need to merge all vcf file and then use this...

ADD REPLY

Login before adding your answer.

Traffic: 1673 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6