java -jar dist/jvarkit.jar vcffilterjdk -e 'return variant.getGenotypes().stream().filter(G->G.hasAD() && G.getAD().length==2).allMatch(G->{final int[] ad=G.getAD();double sum=ad[0]+ad[1];int max=Math.max(ad[0],ad[1]);if(sum==0) return false; double f = max/sum;return f<=0.6 && f>=0.4;});' in.vcf
Hello, I have a vcf file with 11 samples.
I would like to keep sites where, for all samples, this condition is true:
MAX(AD)/SUM(AD) <= 0.6 & MAX(AD)/SUM(AD) >=0.4
In other words, I want to keep sites where, for all samples, the allele frequencies to support the SNP are between 0.4 et 0.6.
I am using this command
bcftools view -i 'MAX(AD)/SUM(AD) <= 0.6 & MAX(AD)/SUM(AD) >=0.4' biallelicSNP.vcf.gz > biallelicSNP.freq04_06.vcf
However, this results in a very low number of sites (compared to the initial number). Therefore, I am wondering if bcftools is not doing here something else than I think it does. What's your opinion? Thank you
EDIT: I think I was wrong, and the proper way of doing it is through VAF tag, so I did
bcftools +fill-tags biallelicSNP.vcf.gz -- -t FORMAT/VAF > biallelicSNP.VAF.vcf
bcftools view -i 'VAF>= 0.4 & VAF <= 0.6' biallelicSNP.VAF.vcf > biallelicSNP.freq04_06.vcf
I would still apreciate some opinion, especially on why the first method didn't work.
Thanks!
EDIT2: actually it doesn't work as it doesn't check the condition is satisfied for ALL samples.
Thanks Pierre, I am going to use it for comparison
let's try...
ok. bablablablablabalabala weird bug
Both commands work equally, at least for me
It is the same command. Just posted two different ways.