Hello,
I have a vcf with 20 samples. I want to find for each sample the sites that are 1/1, only in that sample (so other samples must have genotypes 0/1 or 0/0). I know I can use filters such as GT="aa"'
However, how do I say GT="aa" for sample 1 and 'is not "aa"' in all other samples? Is there an easy way without manually specifying the GT of each sample, which would end up in a very long command? I know you can specify the exact genotype you expect per sample, but I could end up with a vcf with 50 samples, which might be tedious. I could write a Python script that would print the commands.
How would you approach this? There must be a trivial command, but I don't find it.
Thanks a lot
EDIT: of course, it will be easier for you with a sample of the data. Please note I have excluded sites with unknown genotypes but I might not filter them out. It's still an open question. I manually edited this snippet to include a line with a "1/1"
bcftools view -e 'GT = "./."' all_bcftools2_merged.vcf|head -n 50
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##bcftoolsVersion=1.18+htslib-1.18
##bcftoolsCommand=mpileup -Ou -I --fasta-ref vaga.fa ../ancestor.sorted.bam
##reference=file://vaga.fa
##contig=<ID=Chrom_3,length=20354777>
##contig=<ID=Chrom_1,length=18146847>
##contig=<ID=Chrom_5,length=16930519>
##contig=<ID=Chrom_2,length=16274841>
##contig=<ID=Chrom_4,length=15224634>
##contig=<ID=Chrom_6,length=13893210>
##ALT=<ID=*,Description="Represents allele(s) other than observed.">
##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.">
##INFO=<ID=IDV,Number=1,Type=Integer,Description="Maximum number of raw reads supporting an indel">
##INFO=<ID=IMF,Number=1,Type=Float,Description="Maximum fraction of raw reads supporting an indel">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read depth">
##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias for filtering splice-site artefacts in RNA-seq data (bigger is better)",Version="3">
##INFO=<ID=RPBZ,Number=1,Type=Float,Description="Mann-Whitney U-z test of Read Position Bias (closer to 0 is better)">
##INFO=<ID=MQBZ,Number=1,Type=Float,Description="Mann-Whitney U-z test of Mapping Quality Bias (closer to 0 is better)">
##INFO=<ID=BQBZ,Number=1,Type=Float,Description="Mann-Whitney U-z test of Base Quality Bias (closer to 0 is better)">
##INFO=<ID=MQSBZ,Number=1,Type=Float,Description="Mann-Whitney U-z test of Mapping Quality vs Strand Bias (closer to 0 is better)">
##INFO=<ID=SCBZ,Number=1,Type=Float,Description="Mann-Whitney U-z test of Soft-Clip Length Bias (closer to 0 is better)">
##INFO=<ID=SGB,Number=1,Type=Float,Description="Segregation based metric, http://samtools.github.io/bcftools/rd-SegBias.pdf">
##INFO=<ID=MQ0F,Number=1,Type=Float,Description="Fraction of MQ0 reads (smaller is better)">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes for each ALT allele, in the same order as listed">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
##INFO=<ID=DP4,Number=4,Type=Integer,Description="Number of high-quality ref-forward , ref-reverse, alt-forward and alt-reverse bases">
##INFO=<ID=MQ,Number=1,Type=Integer,Description="Average mapping quality">
##bcftools_callVersion=1.18+htslib-1.18
##bcftools_callCommand=call -mv -Ob -o ancestor_bcftools.bcf; Date=Wed Sep 13 10:01:56 2023
##bcftools_viewVersion=1.18+htslib-1.18
##bcftools_viewCommand=view ancestor_bcftools.bcf; Date=Thu Sep 14 18:21:29 2023
##bcftools_viewCommand=view -Ou -o ancestor_bcftools2.bcf; Date=Thu Sep 14 18:21:29 2023
##bcftools_mergeVersion=1.18+htslib-1.18
##bcftools_mergeCommand=merge --threads 24 ancestor_bcftools2.bcf D2C150g_bcftools2.bcf D2C1_bcftools2.bcf D2C350g_bcftools2.bcf D2C3_bcftools2.bcf D3A350g_bcftools2.bcf D3A3_bcftools2.bcf D4B450g_bcftools2.bcf D4B4_bcftools2.bcf D5C150g_bcftools2.bcf D5C1_bcftools2.bcf D5C350g_bcftools2.bcf D5C3_bcftools2.bcf H2B450g_bcftools2.bcf H2B4_bcftools2.bcf H3A450g_bcftools2.bcf H3A4_bcftools2.bcf H3C450g_bcftools2.bcf H3C4_bcftools2.bcf H5A250g_bcftools2.bcf H5A2_bcftools2.bcf H5A450g_bcftools2.bcf H5A4_bcftools2.bcf H5C250g_bcftools2.bcf H5C2_bcftools2.bcf; Date=Mon Sep 25 14:45:31 2023
##bcftools_viewCommand=view -e 'GT = "./."' all_bcftools2_merged.vcf; Date=Thu Oct 12 16:23:22 2023
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT ancestor D2C150g D2C1 D2C350g D2C3 D3A350g D3A3 D4B450g D4B4 D5C150g D5C1 D5C350g D5C3 H2B450g H2B4 H3A450g H3A4 H3C450g H3C4 H5A250g H5A2 H5A450g H5A4 H5C250g H5C2
Chrom_3 7253 . T A 221.3 . VDB=0.535543;SGB=-0.693147;RPBZ=-1.50431;MQBZ=1.34343;MQSBZ=-1.02125;BQBZ=0.323777;SCBZ=0.975289;MQ0F=0.00398406;MQ=43;DP=4623;DP4=1443,1221,494,349;AN=50;AC=25 GT:PL 0/1:136,0,255 0/1:97,0,255 0/1:175,0,255 0/1:240,0,255 0/1:255,0,255 0/1:125,0,255 0/1:255,0,255 0/1:208,0,255 0/1:230,0,255 0/1:255,0,255 0/1:255,0,255 0/1:218,0,255 0/1:255,0,255 0/1:177,0,255 0/1:255,0,202 0/1:255,0,255 0/1:241,0,255 0/1:255,0,255 0/1:249,0,255 0/1:136,0,255 0/1:220,0,255 0/1:219,0,255 0/1:255,0,255 0/1:166,0,255 0/1:172,0,255
Chrom_3 7906 . G A 222.406 . VDB=1.00557e-07;SGB=-0.693147;RPBZ=-1.81995;MQBZ=-4.11869;MQSBZ=0.0750985;BQBZ=-2.55896;SCBZ=4.94681;MQ0F=0;MQ=58;DP=2581;DP4=522,513,514,371;AN=50;AC=25 GT:PL 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:67,0,255 0/1:93,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:243,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255
Chrom_3 7911 . C T 222.411 . VDB=2.66194e-07;SGB=-0.693147;RPBZ=-0.981365;MQBZ=-4.00625;MQSBZ=0.0765345;BQBZ=-3.22205;SCBZ=5.27961;MQ0F=0;MQ=58;DP=2537;DP4=501,513,511,380;AN=50;AC=25 GT:PL 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:45,0,255 0/1:40,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:243,0,239 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255
Chrom_3 7912 . G A 222.405 . VDB=5.09897e-07;SGB=-0.693147;RPBZ=-0.98308;MQBZ=-4.06079;MQSBZ=0.0765345;BQBZ=-3.32505;SCBZ=4.87485;MQ0F=0;MQ=58;DP=2539;DP4=503,515,512,378;AN=50;AC=25 GT:PL 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:59,0,255 0/1:90,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:245,0,236 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255
Chrom_3 7949 . G C 222.406 . VDB=0.201825;SGB=-0.693147;RPBZ=1.81576;MQBZ=-4.12627;MQSBZ=-0.328279;BQBZ=0.966014;SCBZ=2.96348;MQ0F=0.00518135;MQ=57;DP=2899;DP4=507,536,593,488;AN=50;AC=25 GT:PL 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:206,0,255 0/1:197,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,253 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255
Chrom_3 7972 . A C 222.406 . VDB=0.0949276;SGB=-0.693147;RPBZ=-2.4423;MQBZ=-0.00764329;MQSBZ=0.425233;BQBZ=-2.16904;SCBZ=3.54574;MQ0F=0.0060241;MQ=59;DP=2334;DP4=510,545,346,326;AN=50;AC=25 GT:PL 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:76,0,255 0/1:118,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:230,0,245 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255
Chrom_3 7974 . A T 222.409 . VDB=0.075728;SGB=-0.693147;RPBZ=-2.13329;MQBZ=-0.527608;MQSBZ=-0.123898;BQBZ=-1.0772;SCBZ=3.61746;MQ0F=0;MQ=59;DP=2331;DP4=511,542,346,323;AN=50;AC=25 GT:PL 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:76,0,255 0/1:120,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:223,0,245 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255
Chrom_3 7980 . C A 222.404 . VDB=0.135845;SGB=-0.693147;RPBZ=-1.43568;MQBZ=-0.50627;MQSBZ=-0.15972;BQBZ=-0.961533;SCBZ=3.49738;MQ0F=0;MQ=59;DP=2335;DP4=509,541,356,318;AN=50;AC=25 GT:PL 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:86,0,255 0/1:100,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:235,0,228 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255
Chrom_3 7983 . C T 222.403 . VDB=0.106311;SGB=-0.693147;RPBZ=-1.30284;MQBZ=-0.492116;MQSBZ=-0.161765;BQBZ=0.152925;SCBZ=3.94198;MQ0F=0;MQ=59;DP=2341;DP4=511,541,357,322;AN=50;AC=25 GT:PL 0/1:255,0,255 0/1:255,0,255 1/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:86,0,255 0/1:111,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:233,0,236 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255
Chrom_3 8004 . T G 222.406 . VDB=0.557767;SGB=-0.693147;RPBZ=0.23356;MQBZ=-1.95341;MQSBZ=-1.52365;BQBZ=-0.360925;SCBZ=4.40809;MQ0F=0;MQ=59;DP=2386;DP4=522,519,356,344;AN=50;AC=25 GT:PL 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:98,0,255 0/1:108,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,204 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255
Chrom_3 8010 . T C 222.406 . VDB=0.621464;SGB=-0.693147;RPBZ=0.764096;MQBZ=-0.452836;MQSBZ=-1.52365;BQBZ=-0.468479;SCBZ=4.32313;MQ0F=0;MQ=59;DP=2399;DP4=529,513,360,349;AN=50;AC=25 GT:PL 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:98,0,255 0/1:118,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,204 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255
EDIT2: edited for spelling and grammar, sorry not a native speaker I hope my wording makes sense.
Would you please head -n 20 to see how is your data?
Of course, sorry, I am going to edit the original post