In the man page for bcftools expressions, the difference between |
and ||
is described as follows:
QUAL>10 | FMT/GQ>10
.. true for sites withQUAL>10
or a sample withGQ>10
, but selects only samples withGQ>10
QUAL>10 || FMT/GQ>10
.. true for sites withQUAL>10
or a sample withGQ>10
, plus selects all samples at such sites
However, I don't seen any difference in the output from these two commands, which filter out sites with more than 2 alleles, or genotype depth <= 20:
bcftools filter unfiltered.bcf.gz -e 'N_ALT >= 2 | FMT/DP<=20' | bcftools query -l | wc -l #672
bcftools filter unfiltered.bcf.gz -e 'N_ALT >= 2 || FMT/DP<=20' | bcftools query -l | wc -l #672
As I understand the manual, the first command should filter out samples with depth <=20. I can separately test to confirm that indeed, there are samples which have read depth <20 at many markers:
bcftools view -H unfiltered.vcf.gz | wc -l #9989
bcftools filter unfiltered.bcf.gz -e 'FMT/DP<=20' | bcftools view -H | wc -l #97
Any advice on how I am misinterpreting |
and ||
would be appreciated. Thanks!
This question is related to a broader question regarding filtering by sample attributes, instead of site attributes, as described here: https://github.com/samtools/bcftools/issues/1391. Any suggestions on alternative methods for filtering by samples would be appreciated.