Hi,
I am analysing some SNPs in a VCF, and I have found some mutation of interest, but, I would like to know what this output in the samples fields mean.
The two SNPs I am interested in are these:
1 270574995 . T C 8523.06 PASS
AC=64;AF=0.889;AN=72;DP=333;FS=0.000;MQ=60.00;set=Intersection GT:AD:DP:GQ:PGT:PID:PL
1/1:0,20:20:60:1|1:270574995_T_C:900,60,0 1/1:0,13:13:39:1|1:270574995_T_C:585,39,0
0/1:1,5:6:27:0|1:270574995_T_C:207,0,27 1/1:0,4:4:12:1|1:270574995_T_C:180,12,0
1/1:0,12:12:36:1|1:270574995_T_C:540,36,0 1/1:0,16:16:48:1|1:270574995_T_C:720,48,0
0/1:1,7:8:21:0|1:270574995_T_C:291,0,21 1/1:0,19:19:57:1|1:270574995_T_C:855,57,0
1/1:0,29:29:87:1|1:270574995_T_C:1305,87,0 1/1:0,21:21:63:1|1:270574995_T_C:945,63,0
1/1:0,4:4:12:1|1:270574995_T_C:180,12,0 1/1:0,4:4:12:1|1:270574995_T_C:180,12,0
1/1:0,6:6:18:1|1:270574995_T_C:270,18,0 1/1:0,6:6:18:1|1:270574995_T_C:249,18,0
0/1:5,6:11:99:0|1:270574995_T_C:237,0,192 1/1:0,10:10:36:1|1:270574995_T_C:509,36,0
1/1:0,6:6:18:1|1:270574995_T_C:270,18,0 0/1:2,4:6:99:0|1:270574995_T_C:159,0,99
1/1:0,7:7:21:1|1:270574995_T_C:305,21,0 1/1:0,6:6:18:1|1:270574995_T_C:270,18,0 0/0:5,0:5:0:.:.:0,0,92
1/1:0,3:3:9:1|1:270574995_T_C:135,9,0 1/1:0,4:4:12:1|1:270574995_T_C:180,12,0
1/1:0,7:7:21:1|1:270574995_T_C:315,21,0 1/1:0,11:11:33:1|1:270574995_T_C:495,33,0
1/1:0,8:8:24:1|1:270574995_T_C:355,24,0 1/1:0,6:6:18:1|1:270574995_T_C:270,18,0
1/1:0,8:8:24:1|1:270574995_T_C:360,24,0 1/1:0,7:7:21:1|1:270574995_T_C:315,21,0
1/1:0,9:9:27:1|1:270574995_T_C:372,27,0 1/1:0,9:9:27:1|1:270574995_T_C:405,27,0
1/1:0,9:9:27:1|1:270574995_T_C:405,27,0 1/1:0,6:6:18:1|1:270574995_T_C:270,18,0 0/0:4,0:4:12:.:.:0,12,109
1/1:0,8:8:24:1|1:270574995_T_C:360,24,0 1/1:0,11:11:33:1|1:270574995_T_C:495,33,0
1 270574996 . T A 8523.06 PASS AC=64;AF=0.889;AN=72;DP=335;MQ=60.00;set=Intersection
GT:AD:DP:GQ:PGT:PID:PL 1/1:0,20:20:60:1|1:270574995_T_C:900,60,0
1/1:0,13:13:39:1|1:270574995_T_C:585,39,0 0/1:1,5:6:27:0|1:270574995_T_C:207,0,27
1/1:0,4:4:12:1|1:270574995_T_C:180,12,0 1/1:0,12:12:36:1|1:270574995_T_C:540,36,0
1/1:0,16:16:48:1|1:270574995_T_C:720,48,0 0/1:1,7:8:21:0|1:270574995_T_C:291,0,21
1/1:0,19:19:57:1|1:270574995_T_C:855,57,0 1/1:0,29:29:87:1|1:270574995_T_C:1305,87,0
1/1:0,21:21:63:1|1:270574995_T_C:945,63,0 1/1:0,4:4:12:1|1:270574995_T_C:180,12,0
1/1:0,4:4:12:1|1:270574995_T_C:180,12,0 1/1:0,6:6:18:1|1:270574995_T_C:270,18,0
1/1:0,5:5:18:1|1:270574995_T_C:249,18,0 0/1:5,6:11:99:0|1:270574995_T_C:237,0,192
1/1:0,12:12:36:1|1:270574995_T_C:509,36,0 1/1:0,6:6:18:1|1:270574995_T_C:270,18,0
0/1:3,4:7:99:0|1:270574995_T_C:159,0,99 1/1:0,7:7:21:1|1:270574995_T_C:305,21,0
1/1:0,6:6:18:1|1:270574995_T_C:270,18,0 0/0:5,0:5:0:.:.:0,0,92 1/1:0,3:3:9:1|1:270574995_T_C:135,9,0
1/1:0,4:4:12:1|1:270574995_T_C:180,12,0 1/1:0,7:7:21:1|1:270574995_T_C:315,21,0
1/1:0,11:11:33:1|1:270574995_T_C:495,33,0 1/1:0,8:8:24:1|1:270574995_T_C:355,24,0
1/1:0,6:6:18:1|1:270574995_T_C:270,18,0 1/1:0,8:8:24:1|1:270574995_T_C:360,24,0
1/1:0,7:7:21:1|1:270574995_T_C:315,21,0 1/1:0,9:9:27:1|1:270574995_T_C:372,27,0
1/1:0,9:9:27:1|1:270574995_T_C:405,27,0 1/1:0,9:9:27:1|1:270574995_T_C:405,27,0
1/1:0,6:6:18:1|1:270574995_T_C:270,18,0 0/0:4,0:4:12:.:.:0,12,109 1/1:0,8:8:24:1|1:270574995_T_C:360,24,0
1/1:0,11:11:33:1|1:270574995_T_C:495,33,0
These are consecutive SNPs, one very deleterious and the other one, compensating it. I would like to know why the samples fields, where I get the genotype for both alleles in each sample, look that way. I would expect to hace these field like, say, 1/1:0,20:20:60:1, but I get 1/1:0,20:20:60:1|1:270574995_T_C:900,60,0. Why is that? I've checked other SNPs and they look as expected.
I would also like to know why the second mutation have the first mutation cited in the samples fields.
Anyone know if this is a special type of output meaning something? Or simply I should not care about it?
Thanks
You should look at your VCF header. From the
FORMAT
column, it is evident the field you're looking for information on is calledPID
, so look for that in the header's##FORMAT
section.I looked at the header and indeed it is refering to the PID and PGT fields. I have been looking about the meaning of this, related to physical phasing. From what I have understood, this applies for consecutive variants or near variants. I do not understand what are the implications of that, as I also noticed that Allele Frequency (AF) are the same for both SNPs.
Could this mean that both SNPs are always present as an haplotype and always segregate together?
it means that the variants are located on the same homologous chromosome.