Yes, working with multi-sample VCFs can be difficult because most filtering methods only deal with single-sample VCFs. Also, the VCF format is not overly amenable to storing multiple samples. For example, the QUAL field can only hold a single value yet has to represent all of the samples. According to one of my mentors, this is why people then go off and develop custom solutions, as I repeatedly do with VCF data.
I believe what you want may be possible with a program called SnpEff, but here's my own solution:
1, filter for your rs ID of interest (must be set in ID field)
bcftools view --include 'ID=="rs71507462"' ProjectMerge.rsID.bcf
1 762592 rs71507462 C G 3684.68 VQSRTrancheSNP99.00to99.90;LowQual;SNP_SPECIFIC_FILTERS DB;Dels=0;FS=0;HaplotypeScore=0;MQ=41.99;MQ0=1;NEGATIVE_TRAIN_SITE;QD=27.98;VQSLOD=-3.612;culprit=MQ;set=FilteredInAll;BaseQRankSum=-0.72;MQRankSum=-0.72;ReadPosRankSum=-0.72;InbreedingCoeff=-0.1805;DP=2013;AF=0.81;MLEAC=34;MLEAF=0.81;AN=594;AC=514 GT:AD:DP:GQ:PL ./.:.:.:.:. 1/1:1,5:6:15:196,15,0 1/1:0,15:15:30:350,30,0 0/1:4,2:6:47:47,0,124 1/1:0,5:5:12:145,12,0 1/1:0,5:5:12:132,12,0 1/1:0,5:5:12:143,12,0 1/1:0,6:6:15:179,15,0 1/1:0,6:6:12:166,12,0 1/1:0,8:8:18:213,18,0 1/1:0,8:8:15:183,15,0 1/1:4,14:18:27:335,27,0 0/1:3,7:10:79:163,0,79 ./.:.:.:.:. 1/1:0,8:8:15:168,15,0 1/1:1,14:15:33:389,33,0 0/1:6,3:9:54:54,0,149 1/1:3,5:8:12:150,12,0 1/1:0,6:6:15:195,15,0 1/1:1,4:5:6:72,6,0 1/1:0,3:3:6:52,6,0 1/1:0,7:7:18:207,18,0 ./.:.:.:.:. ./.:.:.:.:. ./.:.:.:.:. 1/1:1,4:5:6:63,6,0 1/1:1,3:4:6:80,6,0
2, 'melt' the the data into long format and ensure that we output the genotype (GT) and the genotype quality (GQ)
bcftools view --include 'ID=="rs71507462"' ProjectMerge.rsID.bcf | bcftools query -f'[%CHROM:%POS:%REF:%ALT:%SAMPLE:%GT:%GQ\n]'
1:762592:C:G:16422:./.:.
1:762592:C:G:16427:1/1:15
1:762592:C:G:16451:1/1:30
1:762592:C:G:16456:0/1:47
1:762592:C:G:17412:1/1:12
1:762592:C:G:17413:1/1:12
1:762592:C:G:17417:1/1:12
1:762592:C:G:17418:1/1:15
1:762592:C:G:17422:1/1:12
1:762592:C:G:17423:1/1:18
1:762592:C:G:17427:1/1:15
1:762592:C:G:17428:1/1:27
1:762592:C:G:17432:0/1:79
1:762592:C:G:17433:./.:.
1:762592:C:G:17437:1/1:15
1:762592:C:G:17438:1/1:33
3, use a simple awk
filter on the 7th column (GQ value) to select those GQs > 10, then tabulate the genotypes
bcftools view --include 'ID=="rs71507462"' ProjectMerge.rsID.bcf | bcftools query -f'[%CHROM:%POS:%REF:%ALT:%SAMPLE:%GT:%GQ\n]' | awk -F ":" '$7>10' | cut -f6 -d":" | sort | uniq -c
4 0/0
51 0/1
135 1/1
4, same as #3 but filter for GQ > 30
bcftools view --include 'ID=="rs71507462"' ProjectMerge.rsID.bcf | bcftools query -f'[%CHROM:%POS:%REF:%ALT:%SAMPLE:%GT:%GQ\n]' | awk -F ":" '$7>30' | cut -f6 -d":" | sort | uniq -c
33 0/1
5 1/1
-----------------------------------------------
--------------------------------------------
This script (above) is only valid for a single variant lookup. If your output data for more than one rs ID at the same time, you'll have to modify it. For example, you can 'tally up' (count up) the genotypes of each variant with GQ > 30 across all individuals as follows:
bcftools view ProjectMerge.rsID.bcf | head -100 | bcftools query -f'[%ID:%GT:%GQ\n]' | awk -F ":" '$3>30' | cut -f1-2 -d":" | sort | uniq -c
31 rs140739101:0/0
1 rs140739101:0/1
16 rs142004627:0/0
1 rs142004627:0/1
1 rs150690004:1/1
23 rs190717287:0/0
1 rs190717287:0/1
1 rs200505207:0/1
1 rs62639105:0/0
21 rs62639105:0/1
4 rs75062661:0/0
4 rs75062661:0/1
214 rs75062661:1/1
.
bcftools view ProjectMerge.rsID.bcf | head -100 | bcftools query -f'[%CHROM:%POS:%REF:%ALT:%ID:%GT:%GQ\n]' | awk -F ":" '$7>30' | cut -f1-6 -d":" | sort | uniq -c
1 1:66162:A:T:rs62639105:0/0
21 1:66162:A:T:rs62639105:0/1
31 1:69428:T:G:rs140739101:0/0
1 1:69428:T:G:rs140739101:0/1
16 1:69453:G:A:rs142004627:0/0
1 1:69453:G:A:rs142004627:0/1
1 1:69496:G:A:rs150690004:1/1
4 1:69511:A:G:rs75062661:0/0
4 1:69511:A:G:rs75062661:0/1
214 1:69511:A:G:rs75062661:1/1
23 1:69534:T:C:rs190717287:0/0
1 1:69534:T:C:rs190717287:0/1
1 1:69761:A:T:rs200505207:0/1
----------------------------
Other scripts of mine that you may find of use:
Kevin
Thank you so much for the help! I'm trying to more so get my file to be like the following:
FamilyID IndividualID rs1 rs2 rs3 rs4 rs5 ... rsN fam1 1 0/0 0/1 0/1 0/0 0/0 fam1 2 0/2 0/1 0/1 0/1 0/0
if that makes sense - kind of like your first example but a bit more parallel. The thing is, I have to filter on the GQ before hand to prevent any false positives but since the vcf file contains ~200 individuals and I'd want multiple SNPs, that's what has been really confusing for me. I inherited the data from the previous student and I'm not used to this kind of formatting.
Is Family ID even encoded in your VCF? What exactly are you aiming to do?