Hello, I have a multisample VCF, 10 samples + one reference sample. I am interesting in group of samples. For examples, if there is this SNP: Chrom_3:1258955 0/1 for sample 1, sample2, sample3, sample4 but that the genotype of all other samples is 0/0, I would like to find it.
So far I have tried
bcftools view -i 'GT[0-3]="alt" & GT[4-10]="ref"' merged_snps_only.bcf.gz
However, this returns nothing. but I do know that there are lines in the vcf that actually satisfies this condition. I extracted the genotypes only and used awk to check that indeed there are SNP that are 0/1 for the first 4 samples and 0/0 for all the others.
But I am puzzled why bcftools doesn't return me the same as awk.
Any idea?
Thanks all :)
EDIT: this is what I did and showed me my condition is indeed satisfied
bcftools query -e'GT ="."' -f'%CHROM %POS [ %GT]\n' merged_snps_only.bcf.gz > merged_genotypes.only
perl -pie 's/\|/\//g' merged_genotypes.only #for convenience
awk '$3=="0/1" && $4=="0/1" && $5=="0/1" && $6=="0/1" && $7=="0/0" && $8=="0/0" && $9=="0/0" && $10=="0/0" && $11=="0/0" && $12=="0/0" && $13=="0/0"' merged_genotypes.only > group1_4ALT_others_ref
Output of awk
Chrom_3 1163847 0/1 0/1 0/1 0/1 0/0 0/0 0/0 0/0 0/0 0/0 0/0
Chrom_3 1163852 0/1 0/1 0/1 0/1 0/0 0/0 0/0 0/0 0/0 0/0 0/0
Chrom_3 1163862 0/1 0/1 0/1 0/1 0/0 0/0 0/0 0/0 0/0 0/0 0/0
Chrom_3 1163864 0/1 0/1 0/1 0/1 0/0 0/0 0/0 0/0 0/0 0/0 0/0
Chrom_3 1163867 0/1 0/1 0/1 0/1 0/0 0/0 0/0 0/0 0/0 0/0 0/0
Chrom_3 1163870 0/1 0/1 0/1 0/1 0/0 0/0 0/0 0/0 0/0 0/0 0/0
Chrom_3 1163880 0/1 0/1 0/1 0/1 0/0 0/0 0/0 0/0 0/0 0/0 0/0
Chrom_3 1343041 0/1 0/1 0/1 0/1 0/0 0/0 0/0 0/0 0/0 0/0 0/0
Chrom_3 1378584 0/1 0/1 0/1 0/1 0/0 0/0 0/0 0/0 0/0 0/0 0/0
Chrom_3 1565128 0/1 0/1 0/1 0/1 0/0 0/0 0/0 0/0 0/0 0/0 0/0
Then doing
awk '{print $1"\t"$2}' group1_4ALT_others_ref > group1_4ALT_others_ref.coordinates
bcftools view -R group1_4ALT_others_ref.coordinates merged_snps_only.bcf.gz > group1_4ALT_others_ref.vcf
Gives me the vcf I need. I still suspect there must be a less convoluted way of doing this.
Hello Pierre, this works like a charm.
I am still wondering what's wrong with my bcftools command.
Could you give guidance as how to learn to use your program and form queries with it? It seems insanely powerful but I am a bit at loss with the syntax. Thanks
Edit, how do I exclude the sites which have more than 2 alleles? (phased or not).