Hello, this question is somehow complementary to what I asked yesterday here:
Using bcftools to find unique alt homozygous sites
Now let's say I want to find the SNPs 0/1 unique to the sample D3A350g_bcftools2 (see below)
I know I can use
bcftools view -s D3A350g_bcftools2.bcf -x all_bcftools2_merged.vcf
But there is also the +contrast plugin, and this seems like it could work as well
bcftools +contrast -a NOVELGT -0 ancestor_bcftools2.bcf,D2C150g_bcftools2.bcf,D2C1_bcftools2.bcf,D2C350g_bcftools2.bcf,D2C3_bcftools2.bcf,D3A3_bcftools2.bcf,D4B450g_bcftools2.bcf,D4B4_bcftools2.bcf,D5C150g_bcftools2.bcf,D5C1_bcftools2.bcf,D5C350g_bcftools2.bcf,D5C3_bcftools2.bcf,H2B450g_bcftools2.bcf,H2B4_bcftools2.bcf,H3A450g_bcftools2.bcf,H3A4_bcftools2.bcf,H3C450g_bcftools2.bcf,H3C4_bcftools2.bcf,H5A250g_bcftools2.bcf,H5A2_bcftools2.bcf,H5A450g_bcftools2.bcf,H5A4_bcftools2.bcf,H5C250g_bcftools2.bcf,H5C2_bcftools2.bcf -1 D3A350g_bcftools2.bcf
I can't wrap my mind around what's the correct approach. And on top of that, there is the -isec option. I realize the developers have implemented these three functions for different purposes, but they are not apparent to me. The +contrast and -x approaches give vastly different outputs. I promise you I have read the doc. I just can't find what would make the most sense for my question: "for each of my samples, I want to find its unique SNPs". I would be grateful for any pointers... honestly, I was writing a custom script, but I thought, "why remake the wheels"? But the wheel is hard to comprehend. Thanks for any help.
Example of the vcf
bcftools view -e 'GT = "./."' all_bcftools2_merged.vcf|head -n 50
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##bcftoolsVersion=1.18+htslib-1.18
##bcftoolsCommand=mpileup -Ou -I --fasta-ref vaga.fa ../ancestor.sorted.bam
##reference=file://vaga.fa
##contig=<ID=Chrom_3,length=20354777>
##contig=<ID=Chrom_1,length=18146847>
##contig=<ID=Chrom_5,length=16930519>
##contig=<ID=Chrom_2,length=16274841>
##contig=<ID=Chrom_4,length=15224634>
##contig=<ID=Chrom_6,length=13893210>
##ALT=<ID=*,Description="Represents allele(s) other than observed.">
##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.">
##INFO=<ID=IDV,Number=1,Type=Integer,Description="Maximum number of raw reads supporting an indel">
##INFO=<ID=IMF,Number=1,Type=Float,Description="Maximum fraction of raw reads supporting an indel">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read depth">
##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias for filtering splice-site artefacts in RNA-seq data (bigger is better)",Version="3">
##INFO=<ID=RPBZ,Number=1,Type=Float,Description="Mann-Whitney U-z test of Read Position Bias (closer to 0 is better)">
##INFO=<ID=MQBZ,Number=1,Type=Float,Description="Mann-Whitney U-z test of Mapping Quality Bias (closer to 0 is better)">
##INFO=<ID=BQBZ,Number=1,Type=Float,Description="Mann-Whitney U-z test of Base Quality Bias (closer to 0 is better)">
##INFO=<ID=MQSBZ,Number=1,Type=Float,Description="Mann-Whitney U-z test of Mapping Quality vs Strand Bias (closer to 0 is better)">
##INFO=<ID=SCBZ,Number=1,Type=Float,Description="Mann-Whitney U-z test of Soft-Clip Length Bias (closer to 0 is better)">
##INFO=<ID=SGB,Number=1,Type=Float,Description="Segregation based metric, http://samtools.github.io/bcftools/rd-SegBias.pdf">
##INFO=<ID=MQ0F,Number=1,Type=Float,Description="Fraction of MQ0 reads (smaller is better)">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes for each ALT allele, in the same order as listed">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
##INFO=<ID=DP4,Number=4,Type=Integer,Description="Number of high-quality ref-forward , ref-reverse, alt-forward and alt-reverse bases">
##INFO=<ID=MQ,Number=1,Type=Integer,Description="Average mapping quality">
##bcftools_callVersion=1.18+htslib-1.18
##bcftools_callCommand=call -mv -Ob -o ancestor_bcftools.bcf; Date=Wed Sep 13 10:01:56 2023
##bcftools_viewVersion=1.18+htslib-1.18
##bcftools_viewCommand=view ancestor_bcftools.bcf; Date=Thu Sep 14 18:21:29 2023
##bcftools_viewCommand=view -Ou -o ancestor_bcftools2.bcf; Date=Thu Sep 14 18:21:29 2023
##bcftools_mergeVersion=1.18+htslib-1.18
##bcftools_mergeCommand=merge --threads 24 ancestor_bcftools2.bcf D2C150g_bcftools2.bcf D2C1_bcftools2.bcf D2C350g_bcftools2.bcf D2C3_bcftools2.bcf D3A350g_bcftools2.bcf D3A3_bcftools2.bcf D4B450g_bcftools2.bcf D4B4_bcftools2.bcf D5C150g_bcftools2.bcf D5C1_bcftools2.bcf D5C350g_bcftools2.bcf D5C3_bcftools2.bcf H2B450g_bcftools2.bcf H2B4_bcftools2.bcf H3A450g_bcftools2.bcf H3A4_bcftools2.bcf H3C450g_bcftools2.bcf H3C4_bcftools2.bcf H5A250g_bcftools2.bcf H5A2_bcftools2.bcf H5A450g_bcftools2.bcf H5A4_bcftools2.bcf H5C250g_bcftools2.bcf H5C2_bcftools2.bcf; Date=Mon Sep 25 14:45:31 2023
##bcftools_viewCommand=view -e 'GT = "./."' all_bcftools2_merged.vcf; Date=Thu Oct 12 16:23:22 2023
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT ancestor D2C150g D2C1 D2C350g D2C3 D3A350g D3A3 D4B450g D4B4 D5C150g D5C1 D5C350g D5C3 H2B450g H2B4 H3A450g H3A4 H3C450g H3C4 H5A250g H5A2 H5A450g H5A4 H5C250g H5C2
Chrom_3 7253 . T A 221.3 . VDB=0.535543;SGB=-0.693147;RPBZ=-1.50431;MQBZ=1.34343;MQSBZ=-1.02125;BQBZ=0.323777;SCBZ=0.975289;MQ0F=0.00398406;MQ=43;DP=4623;DP4=1443,1221,494,349;AN=50;AC=25 GT:PL 0/1:136,0,255 0/1:97,0,255 0/1:175,0,255 0/1:240,0,255 0/1:255,0,255 0/1:125,0,255 0/1:255,0,255 0/1:208,0,255 0/1:230,0,255 0/1:255,0,255 0/1:255,0,255 0/1:218,0,255 0/1:255,0,255 0/1:177,0,255 0/1:255,0,202 0/1:255,0,255 0/1:241,0,255 0/1:255,0,255 0/1:249,0,255 0/1:136,0,255 0/1:220,0,255 0/1:219,0,255 0/1:255,0,255 0/1:166,0,255 0/1:172,0,255
Chrom_3 7906 . G A 222.406 . VDB=1.00557e-07;SGB=-0.693147;RPBZ=-1.81995;MQBZ=-4.11869;MQSBZ=0.0750985;BQBZ=-2.55896;SCBZ=4.94681;MQ0F=0;MQ=58;DP=2581;DP4=522,513,514,371;AN=50;AC=25 GT:PL 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:67,0,255 0/1:93,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:243,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255
Chrom_3 7911 . C T 222.411 . VDB=2.66194e-07;SGB=-0.693147;RPBZ=-0.981365;MQBZ=-4.00625;MQSBZ=0.0765345;BQBZ=-3.22205;SCBZ=5.27961;MQ0F=0;MQ=58;DP=2537;DP4=501,513,511,380;AN=50;AC=25 GT:PL 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:45,0,255 0/1:40,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:243,0,239 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255
Chrom_3 7912 . G A 222.405 . VDB=5.09897e-07;SGB=-0.693147;RPBZ=-0.98308;MQBZ=-4.06079;MQSBZ=0.0765345;BQBZ=-3.32505;SCBZ=4.87485;MQ0F=0;MQ=58;DP=2539;DP4=503,515,512,378;AN=50;AC=25 GT:PL 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:59,0,255 0/1:90,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:245,0,236 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255
Chrom_3 7949 . G C 222.406 . VDB=0.201825;SGB=-0.693147;RPBZ=1.81576;MQBZ=-4.12627;MQSBZ=-0.328279;BQBZ=0.966014;SCBZ=2.96348;MQ0F=0.00518135;MQ=57;DP=2899;DP4=507,536,593,488;AN=50;AC=25 GT:PL 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:206,0,255 0/1:197,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,253 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255
Chrom_3 7972 . A C 222.406 . VDB=0.0949276;SGB=-0.693147;RPBZ=-2.4423;MQBZ=-0.00764329;MQSBZ=0.425233;BQBZ=-2.16904;SCBZ=3.54574;MQ0F=0.0060241;MQ=59;DP=2334;DP4=510,545,346,326;AN=50;AC=25 GT:PL 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:76,0,255 0/1:118,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:230,0,245 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255
Chrom_3 7974 . A T 222.409 . VDB=0.075728;SGB=-0.693147;RPBZ=-2.13329;MQBZ=-0.527608;MQSBZ=-0.123898;BQBZ=-1.0772;SCBZ=3.61746;MQ0F=0;MQ=59;DP=2331;DP4=511,542,346,323;AN=50;AC=25 GT:PL 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:76,0,255 0/1:120,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:223,0,245 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255
Chrom_3 7980 . C A 222.404 . VDB=0.135845;SGB=-0.693147;RPBZ=-1.43568;MQBZ=-0.50627;MQSBZ=-0.15972;BQBZ=-0.961533;SCBZ=3.49738;MQ0F=0;MQ=59;DP=2335;DP4=509,541,356,318;AN=50;AC=25 GT:PL 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:86,0,255 0/1:100,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:235,0,228 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255
Chrom_3 7983 . C T 222.403 . VDB=0.106311;SGB=-0.693147;RPBZ=-1.30284;MQBZ=-0.492116;MQSBZ=-0.161765;BQBZ=0.152925;SCBZ=3.94198;MQ0F=0;MQ=59;DP=2341;DP4=511,541,357,322;AN=50;AC=25 GT:PL 0/1:255,0,255 0/1:255,0,255 1/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:86,0,255 0/1:111,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:233,0,236 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255
Chrom_3 8004 . T G 222.406 . VDB=0.557767;SGB=-0.693147;RPBZ=0.23356;MQBZ=-1.95341;MQSBZ=-1.52365;BQBZ=-0.360925;SCBZ=4.40809;MQ0F=0;MQ=59;DP=2386;DP4=522,519,356,344;AN=50;AC=25 GT:PL 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:98,0,255 0/1:108,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,204 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255
Chrom_3 8010 . T C 222.406 . VDB=0.621464;SGB=-0.693147;RPBZ=0.764096;MQBZ=-0.452836;MQSBZ=-1.52365;BQBZ=-0.468479;SCBZ=4.32313;MQ0F=0;MQ=59;DP=2399;DP4=529,513,360,349;AN=50;AC=25 GT:PL 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:98,0,255 0/1:118,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,204 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255 0/1:255,0,255
You're on to the right approach. This GT table should help you with your problem from yesterday as well. I often extract GT, AD, DP in a table of this sort because it's a lot easier to work with this 2D dataset than VCF's 3D nature.
Thank you so much. I really appreciate the help I get here. The idea of AD and DP is great! I will do that.