Hello, I have a vcf file with 11 samples.
I would like to keep sites where, for all samples, this condition is true:
MAX(AD)/SUM(AD) <= 0.6 & MAX(AD)/SUM(AD) >=0.4
In other words, I want to keep sites where, for all samples, the allele frequencies to support the SNP are between 0.4 et 0.6.
I am using this command
bcftools view -i 'MAX(AD)/SUM(AD) <= 0.6 & MAX(AD)/SUM(AD) >=0.4' biallelicSNP.vcf.gz > biallelicSNP.freq04_06.vcf
However, this results in a very low number of sites (compared to the initial number). Therefore, I am wondering if bcftools is not doing here something else than I think it does. What's your opinion? Thank you
EDIT: I think I was wrong, and the proper way of doing it is through VAF tag, so I did
bcftools +fill-tags biallelicSNP.vcf.gz -- -t FORMAT/VAF > biallelicSNP.VAF.vcf
bcftools view -i 'VAF>= 0.4 & VAF <= 0.6' biallelicSNP.VAF.vcf > biallelicSNP.freq04_06.vcf
I would still apreciate some opinion, especially on why the first method didn't work.
Thanks!
EDIT2: actually it doesn't work as it doesn't check the condition is satisfied for ALL samples.
Thanks Pierre, I am going to use it for comparison
let's try...
ok. bablablablablabalabala weird bug
Both commands work equally, at least for me
It is the same command. Just posted two different ways.