Hello ,
I applied freebayes to my different samples, generated a VCF file, and annotated it.
I would like to know how I can determine the frequency of each variant per sample.
Does it the AO OR DP ? or something else !!
My vcf file looks like this:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sxxxxxxxx9.fastq Sxxxxxxxx8.fastq Sxxxxxxxx3.fastq Sxxxxxxxx0.fastq Sxxxxxxx10.fastq Sxxxxxxx49.fastq Sxxxxxxxx5.fastq Sxxxxxxxx2.fastq Sxxxxxxxx7.fastq Sxxxxxxxx1.fastq Sxxxxxx341.fastq Sxxxxxx746.fastq Sxxxxxx887.fastq Sxxxxxxx72.fastq Sxxxxxx413.fastq Sxxxxxxx08.fastq Sxxxxxx494.fastq Sxxxxxxx84.fastq
DLXXXXX.4 687 . C T 4.38769E-13 . AB=0;ABP=0;AC=0;AF=0;AN=36;AO=39;CIGAR=1X;DP=2179;DPB=2179;DPRA=0.986735;EPP=63.6445;EPPR=370.696;GTI=0;LEN=1;MEANALT=1.75;MQM=60;MQMR=59.9722;NS=18;NUMALT=1;ODDS=135.142;PAIRED=0;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=259;QR=40628;RO=2119;RPL=0;RPP=87.6977;RPPR=4604.36;RPR=39;RUN=1;SAF=3;SAP=63.6445;SAR=36;SRF=760;SRP=370.696;SRR=1359;TYPE=snp;technology.Nanopore=1;ANN=T|missense_variant|MODERATE|NAD9|Gene_190_750|transcript|ABF29491.1|protein_coding|1/1|c.497C>T|p.Pro166Leu|497/561|497/561|166/186||,T|upstream_gene_variant|MODIFIER|PvRXLR|Gene_818_1972|transcript|ABF29492.1|protein_coding||c.-132C>T|||||132|,T|upstream_gene_variant|MODIFIER|NAD5|Gene_2075_2280|transcript|ABF29493.1|protein_coding||c.-1389C>T|||||1389|WARNING_TRANSCRIPT_INCOMPLETE GT:DP:AD:RO:QR:AO:QA:GL 0/0:116:114,2:114:2317:2:7:0,-34.2545,-208.029 0/0:118:112,6:112:2005:6:35:0,-32.3132,-176.731 0/0:127:124,2:124:2482:2:8:0,-37.2564,-222.749 0/0:125:122,2:122:2064:2:21:0,-35.4144,-183.924 0/0:119:117,2:117:2089:2:15:0,-34.3976,-186.748 0/0:105:102,1:102:2034:1:10:0,-30.0894,-182.259 0/0:136:130,5:130:2349:5:32:0,-37.7391,-208.61 0/0:107:103,1:103:2029:1:12:0,-30.2088,-181.628 0/0:124:119,3:119:2538:3:28:0,-34.1737,-225.986 0/0:119:116,1:116:2262:1:12:0,-34.1188,-202.579 0/0:113:111,2:111:1995:2:12:0,-32.8764,-178.568 0/0:126:125,0:125:2498:0:0:0,-37.6287,-224.944 0/0:114:110,3:110:2030:3:21:0,-32.0964,-180.914 0/0:119:117,0:117:2379:0:0:0,-35.2205,-214.238 0/0:135:129,4:129:2247:4:16:0,-38.587,-200.899 0/0:121:119,1:119:2436:1:6:0,-35.5636,-218.82 0/0:131:129,1:129:2527:1:9:0,-38.3089,-226.709 0/0:124:120,3:120:2347:3:15:0,-35.6842,-209.993
DLXXXXX.4 688 . T C 0.0 . AB=0;ABP=0;AC=0;AF=0;AN=36;AO=139;CIGAR=1X;DP=2122;DPB=2122;DPRA=0;EPP=216.861;EPPR=211.476;GTI=0;LEN=1;MEANALT=1.77778;MQM=60;MQMR=59.9424;NS=18;NUMALT=1;ODDS=117.062;PAIRED=0;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=922;QR=30518;RO=1962;RPL=0;RPP=304.845;RPPR=4263.44;RPR=139;RUN=1;SAF=11;SAP=216.861;SAR=128;SRF=764;SRP=211.476;SRR=1198;TYPE=snp;technology.Nanopore=1;ANN=C|synonymous_variant|LOW|NAD9|Gene_190_750|transcript|ABF29491.1|protein_coding|1/1|c.498T>C|p.Pro166Pro|498/561|498/561|166/186||,C|upstream_gene_variant|MODIFIER|PvRXLR|Gene_818_1972|transcript|ABF29492.1|protein_coding||c.-131T>C|||||131|,C|upstream_gene_variant|MODIFIER|NAD5|Gene_2075_2280|transcript|ABF29493.1|protein_coding||c.-1388T>C|||||1388|WARNING_TRANSCRIPT_INCOMPLETE GT:DP:AD:RO:QR:AO:QA:GL 0/0:112:103,8:103:1774:8:56:0,-28.3743,-154.704 0/0:112:102,9:102:1387:9:56:0,-28.3903,-119.483 0/0:125:118,5:118:1862:5:36:0,-33.7889,-164.406 0/0:117:111,6:111:1541:6:51:0,-30.5455,-134.141 0/0:116:105,8:105:1632:8:56:0,-28.9958,-141.906 0/0:100:95,5:95:1575:5:35:0,-26.883,-138.685 0/0:129:118,11:118:1771:11:74:0,-32.1056,-152.785 0/0:106:101,3:101:1478:3:22:0,-29.3011,-131.116 0/0:123:115,7:115:1920:7:44:0,-32.7719,-168.922 0/0:116:112,4:112:1792:4:26:0,-32.5145,-159.016 0/0:112:100,10:100:1416:10:65:0,-27.28,-121.655 0/0:124:118,6:118:1846:6:33:0,-34.3027,-163.038 0/0:113:97,15:97:1540:15:86:0,-26.0529,-130.944 0/0:118:112,5:112:1813:5:27:0,-32.7788,-160.836 0/0:130:117,11:117:1679:11:62:0,-33.0099,-145.591 0/0:118:110,7:110:1816:7:66:0,-29.2509,-157.556 0/0:129:121,8:121:1809:8:52:0,-34.0879,-158.206 0/0:122:107,11:107:1867:11:75:0,-28.7803,-161.359
DLXXXXX.4 689 . G A 7.66643E-14 . AB=0;ABP=0;AC=0;AF=0;AN=36;AO=80;CIGAR=1X;DP=2172;DPB=2172;DPRA=0;EPP=60.4457;EPPR=363.926;GTI=0;LEN=1;MEANALT=2.61111;MQM=60;MQMR=59.944;NS=18;NUMALT=2;ODDS=113.649;PAIRED=0;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=491;QR=31152;RO=2017;RPL=0;RPP=176.728;RPPR=4382.87;RPR=80;RUN=1;SAF=63;SAP=60.4457;SAR=17;SRF=719;SRP=363.926;SRR=1298;TYPE=snp;technology.Nanopore=1;ANN=A|missense_variant|MODERATE|NAD9|Gene_190_750|transcript|ABF29491.1|protein_coding|1/1|c.499G>A|p.Val167Ile|499/561|499/561|167/186||,A|upstream_gene_variant|MODIFIER|PvRXLR|Gene_818_1972|transcript|ABF29492.1|protein_coding||c.-130G>A|||||130|,A|upstream_gene_variant|MODIFIER|NAD5|Gene_2075_2280|transcript|ABF29493.1|protein_coding||c.-1387G>A|||||1387|WARNING_TRANSCRIPT_INCOMPLETE GT:DP:AD:RO:QR:AO:QA:GL 0/0:116:108,6:108:1864:6:35:0,-31.2324,-164.69 0/0:112:102,1:102:1407:1:6:0,-30.4676,-125.625 0/0:130:125,4:125:1886:4:25:0,-36.6049,-167.56 0/0:121:115,2:115:1587:2:21:0,-33.3197,-140.985 0/0:120:111,6:111:1620:6:35:0,-32.1183,-142.706 0/0:103:95,4:95:1533:4:23:0,-27.7357,-135.986 0/0:132:119,8:119:1850:8:39:0,-34.7647,-163.049 0/0:110:96,8:96:1450:8:50:0,-26.8143,-126.069 0/0:132:121,3:121:1993:3:12:0,-36.2692,-178.336 0/0:119:109,6:109:1723:6:32:0,-31.8144,-152.233 0/0:114:106,4:106:1526:4:18:0,-31.5246,-135.78 0/0:121:116,2:116:1825:2:9:0,-34.7402,-161.873 0/0:114:103,7:103:1532:7:74:0,-26.4452,-131.259 0/0:120:115,3:115:1840:3:31:0,-32.6906,-162.883 0/0:133:122,6:122:1906:6:20:0,-36.7747,-169.812 0/0:120:114,2:114:1884:2:5:0,-34.4911,-169.191 0/0:133:121,7:121:1785:7:50:0,-34.0334,-156.212 0/0:122:119,1:119:1941:1:6:0,-35.5786,-174.246
DLXXXXX.4 689 . G C 7.66643E-14 . AB=0;ABP=0;AC=0;AF=0;AN=36;AO=59;CIGAR=1X;DP=2172;DPB=2172;DPRA=0;EPP=98.7391;EPPR=363.926;GTI=0;LEN=1;MEANALT=2.61111;MQM=60;MQMR=59.944;NS=18;NUMALT=2;ODDS=113.649;PAIRED=0;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=602;QR=31152;RO=2017;RPL=0;RPP=131.127;RPPR=4382.87;RPR=59;RUN=1;SAF=4;SAP=98.7391;SAR=55;SRF=719;SRP=363.926;SRR=1298;TYPE=snp;technology.Nanopore=1;ANN=C|missense_variant|MODERATE|NAD9|Gene_190_750|transcript|ABF29491.1|protein_coding|1/1|c.499G>C|p.Val167Leu|499/561|499/561|167/186||,C|upstream_gene_variant|MODIFIER|PvRXLR|Gene_818_1972|transcript|ABF29492.1|protein_coding||c.-130G>C|||||130|,C|upstream_gene_variant|MODIFIER|NAD5|Gene_2075_2280|transcript|ABF29493.1|protein_coding||c.-1387G>C|||||1387|WARNING_TRANSCRIPT_INCOMPLETE GT:DP:AD:RO:QR:AO:QA:GL 0/0:116:108,2:108:1864:2:29:0,-30.4817,-165.225 0/0:112:102,7:102:1407:7:57:0,-27.665,-121.037 0/0:130:125,1:125:1886:1:9:0,-37.1143,-168.997 0/0:121:115,2:115:1587:2:20:0,-33.4122,-141.075 0/0:120:111,2:111:1620:2:28:0,-31.4799,-143.332 0/0:103:95,2:95:1533:2:11:0,-28.212,-137.064 0/0:132:119,4:119:1850:4:40:0,-33.4161,-162.954 0/0:110:96,5:96:1450:5:42:0,-26.6151,-126.785 0/0:132:121,7:121:1993:7:91:0,-30.2847,-171.225 0/0:119:109,4:109:1723:4:72:0,-27.4863,-148.627 0/0:114:106,3:106:1526:3:33:0,-29.822,-134.428 0/0:121:116,1:116:1825:1:10:0,-34.3175,-161.782 0/0:114:103,3:103:1532:3:34:0,-28.8445,-134.857 0/0:120:115,2:115:1840:2:7:0,-34.6178,-165.044 0/0:133:122,5:122:1906:5:56:0,-33.1551,-166.568 0/0:120:114,4:114:1884:4:36:0,-32.2382,-166.401 0/0:133:121,3:121:1785:3:18:0,-35.7119,-159.09 0/0:122:119,2:119:1941:2:9:0,-35.6246,-173.977
I don't understand what is the " the frequency of each variant per sample." . There is only one genotype at the intersection of a variant of a sample. This genotype can contains ALT allele or not.
what I'm looking for is the number of reads which support a variant per sample. is it DP ?
so this is not "frequency of each variant per sample"
should I correct the title of my issue ?
cross-posted: https://github.com/freebayes/freebayes/issues/792