frequency of each variant per sample
2
1
Entering edit mode
7 months ago

Hello ,

I applied freebayes to my different samples, generated a VCF file, and annotated it.

I would like to know how I can determine the frequency of each variant per sample.

Does it the AO OR DP ? or something else !!

My vcf file looks like this:

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sxxxxxxxx9.fastq Sxxxxxxxx8.fastq Sxxxxxxxx3.fastq Sxxxxxxxx0.fastq Sxxxxxxx10.fastq Sxxxxxxx49.fastq Sxxxxxxxx5.fastq Sxxxxxxxx2.fastq Sxxxxxxxx7.fastq Sxxxxxxxx1.fastq Sxxxxxx341.fastq Sxxxxxx746.fastq Sxxxxxx887.fastq Sxxxxxxx72.fastq Sxxxxxx413.fastq Sxxxxxxx08.fastq Sxxxxxx494.fastq Sxxxxxxx84.fastq
DLXXXXX.4   687 .   C   T   4.38769E-13 .   AB=0;ABP=0;AC=0;AF=0;AN=36;AO=39;CIGAR=1X;DP=2179;DPB=2179;DPRA=0.986735;EPP=63.6445;EPPR=370.696;GTI=0;LEN=1;MEANALT=1.75;MQM=60;MQMR=59.9722;NS=18;NUMALT=1;ODDS=135.142;PAIRED=0;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=259;QR=40628;RO=2119;RPL=0;RPP=87.6977;RPPR=4604.36;RPR=39;RUN=1;SAF=3;SAP=63.6445;SAR=36;SRF=760;SRP=370.696;SRR=1359;TYPE=snp;technology.Nanopore=1;ANN=T|missense_variant|MODERATE|NAD9|Gene_190_750|transcript|ABF29491.1|protein_coding|1/1|c.497C>T|p.Pro166Leu|497/561|497/561|166/186||,T|upstream_gene_variant|MODIFIER|PvRXLR|Gene_818_1972|transcript|ABF29492.1|protein_coding||c.-132C>T|||||132|,T|upstream_gene_variant|MODIFIER|NAD5|Gene_2075_2280|transcript|ABF29493.1|protein_coding||c.-1389C>T|||||1389|WARNING_TRANSCRIPT_INCOMPLETE    GT:DP:AD:RO:QR:AO:QA:GL 0/0:116:114,2:114:2317:2:7:0,-34.2545,-208.029  0/0:118:112,6:112:2005:6:35:0,-32.3132,-176.731 0/0:127:124,2:124:2482:2:8:0,-37.2564,-222.749  0/0:125:122,2:122:2064:2:21:0,-35.4144,-183.924 0/0:119:117,2:117:2089:2:15:0,-34.3976,-186.748 0/0:105:102,1:102:2034:1:10:0,-30.0894,-182.259 0/0:136:130,5:130:2349:5:32:0,-37.7391,-208.61  0/0:107:103,1:103:2029:1:12:0,-30.2088,-181.628 0/0:124:119,3:119:2538:3:28:0,-34.1737,-225.986 0/0:119:116,1:116:2262:1:12:0,-34.1188,-202.579 0/0:113:111,2:111:1995:2:12:0,-32.8764,-178.568 0/0:126:125,0:125:2498:0:0:0,-37.6287,-224.944  0/0:114:110,3:110:2030:3:21:0,-32.0964,-180.914 0/0:119:117,0:117:2379:0:0:0,-35.2205,-214.238  0/0:135:129,4:129:2247:4:16:0,-38.587,-200.899  0/0:121:119,1:119:2436:1:6:0,-35.5636,-218.82   0/0:131:129,1:129:2527:1:9:0,-38.3089,-226.709  0/0:124:120,3:120:2347:3:15:0,-35.6842,-209.993
DLXXXXX.4   688 .   T   C   0.0 .   AB=0;ABP=0;AC=0;AF=0;AN=36;AO=139;CIGAR=1X;DP=2122;DPB=2122;DPRA=0;EPP=216.861;EPPR=211.476;GTI=0;LEN=1;MEANALT=1.77778;MQM=60;MQMR=59.9424;NS=18;NUMALT=1;ODDS=117.062;PAIRED=0;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=922;QR=30518;RO=1962;RPL=0;RPP=304.845;RPPR=4263.44;RPR=139;RUN=1;SAF=11;SAP=216.861;SAR=128;SRF=764;SRP=211.476;SRR=1198;TYPE=snp;technology.Nanopore=1;ANN=C|synonymous_variant|LOW|NAD9|Gene_190_750|transcript|ABF29491.1|protein_coding|1/1|c.498T>C|p.Pro166Pro|498/561|498/561|166/186||,C|upstream_gene_variant|MODIFIER|PvRXLR|Gene_818_1972|transcript|ABF29492.1|protein_coding||c.-131T>C|||||131|,C|upstream_gene_variant|MODIFIER|NAD5|Gene_2075_2280|transcript|ABF29493.1|protein_coding||c.-1388T>C|||||1388|WARNING_TRANSCRIPT_INCOMPLETE   GT:DP:AD:RO:QR:AO:QA:GL 0/0:112:103,8:103:1774:8:56:0,-28.3743,-154.704 0/0:112:102,9:102:1387:9:56:0,-28.3903,-119.483 0/0:125:118,5:118:1862:5:36:0,-33.7889,-164.406 0/0:117:111,6:111:1541:6:51:0,-30.5455,-134.141 0/0:116:105,8:105:1632:8:56:0,-28.9958,-141.906 0/0:100:95,5:95:1575:5:35:0,-26.883,-138.685    0/0:129:118,11:118:1771:11:74:0,-32.1056,-152.785   0/0:106:101,3:101:1478:3:22:0,-29.3011,-131.116 0/0:123:115,7:115:1920:7:44:0,-32.7719,-168.922 0/0:116:112,4:112:1792:4:26:0,-32.5145,-159.016 0/0:112:100,10:100:1416:10:65:0,-27.28,-121.655 0/0:124:118,6:118:1846:6:33:0,-34.3027,-163.038 0/0:113:97,15:97:1540:15:86:0,-26.0529,-130.944 0/0:118:112,5:112:1813:5:27:0,-32.7788,-160.836 0/0:130:117,11:117:1679:11:62:0,-33.0099,-145.591   0/0:118:110,7:110:1816:7:66:0,-29.2509,-157.556 0/0:129:121,8:121:1809:8:52:0,-34.0879,-158.206 0/0:122:107,11:107:1867:11:75:0,-28.7803,-161.359
DLXXXXX.4   689 .   G   A   7.66643E-14 .   AB=0;ABP=0;AC=0;AF=0;AN=36;AO=80;CIGAR=1X;DP=2172;DPB=2172;DPRA=0;EPP=60.4457;EPPR=363.926;GTI=0;LEN=1;MEANALT=2.61111;MQM=60;MQMR=59.944;NS=18;NUMALT=2;ODDS=113.649;PAIRED=0;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=491;QR=31152;RO=2017;RPL=0;RPP=176.728;RPPR=4382.87;RPR=80;RUN=1;SAF=63;SAP=60.4457;SAR=17;SRF=719;SRP=363.926;SRR=1298;TYPE=snp;technology.Nanopore=1;ANN=A|missense_variant|MODERATE|NAD9|Gene_190_750|transcript|ABF29491.1|protein_coding|1/1|c.499G>A|p.Val167Ile|499/561|499/561|167/186||,A|upstream_gene_variant|MODIFIER|PvRXLR|Gene_818_1972|transcript|ABF29492.1|protein_coding||c.-130G>A|||||130|,A|upstream_gene_variant|MODIFIER|NAD5|Gene_2075_2280|transcript|ABF29493.1|protein_coding||c.-1387G>A|||||1387|WARNING_TRANSCRIPT_INCOMPLETE    GT:DP:AD:RO:QR:AO:QA:GL 0/0:116:108,6:108:1864:6:35:0,-31.2324,-164.69  0/0:112:102,1:102:1407:1:6:0,-30.4676,-125.625  0/0:130:125,4:125:1886:4:25:0,-36.6049,-167.56  0/0:121:115,2:115:1587:2:21:0,-33.3197,-140.985 0/0:120:111,6:111:1620:6:35:0,-32.1183,-142.706 0/0:103:95,4:95:1533:4:23:0,-27.7357,-135.986   0/0:132:119,8:119:1850:8:39:0,-34.7647,-163.049 0/0:110:96,8:96:1450:8:50:0,-26.8143,-126.069   0/0:132:121,3:121:1993:3:12:0,-36.2692,-178.336 0/0:119:109,6:109:1723:6:32:0,-31.8144,-152.233 0/0:114:106,4:106:1526:4:18:0,-31.5246,-135.78  0/0:121:116,2:116:1825:2:9:0,-34.7402,-161.873  0/0:114:103,7:103:1532:7:74:0,-26.4452,-131.259 0/0:120:115,3:115:1840:3:31:0,-32.6906,-162.883 0/0:133:122,6:122:1906:6:20:0,-36.7747,-169.812 0/0:120:114,2:114:1884:2:5:0,-34.4911,-169.191  0/0:133:121,7:121:1785:7:50:0,-34.0334,-156.212 0/0:122:119,1:119:1941:1:6:0,-35.5786,-174.246
DLXXXXX.4   689 .   G   C   7.66643E-14 .   AB=0;ABP=0;AC=0;AF=0;AN=36;AO=59;CIGAR=1X;DP=2172;DPB=2172;DPRA=0;EPP=98.7391;EPPR=363.926;GTI=0;LEN=1;MEANALT=2.61111;MQM=60;MQMR=59.944;NS=18;NUMALT=2;ODDS=113.649;PAIRED=0;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=602;QR=31152;RO=2017;RPL=0;RPP=131.127;RPPR=4382.87;RPR=59;RUN=1;SAF=4;SAP=98.7391;SAR=55;SRF=719;SRP=363.926;SRR=1298;TYPE=snp;technology.Nanopore=1;ANN=C|missense_variant|MODERATE|NAD9|Gene_190_750|transcript|ABF29491.1|protein_coding|1/1|c.499G>C|p.Val167Leu|499/561|499/561|167/186||,C|upstream_gene_variant|MODIFIER|PvRXLR|Gene_818_1972|transcript|ABF29492.1|protein_coding||c.-130G>C|||||130|,C|upstream_gene_variant|MODIFIER|NAD5|Gene_2075_2280|transcript|ABF29493.1|protein_coding||c.-1387G>C|||||1387|WARNING_TRANSCRIPT_INCOMPLETE GT:DP:AD:RO:QR:AO:QA:GL 0/0:116:108,2:108:1864:2:29:0,-30.4817,-165.225 0/0:112:102,7:102:1407:7:57:0,-27.665,-121.037  0/0:130:125,1:125:1886:1:9:0,-37.1143,-168.997  0/0:121:115,2:115:1587:2:20:0,-33.4122,-141.075 0/0:120:111,2:111:1620:2:28:0,-31.4799,-143.332 0/0:103:95,2:95:1533:2:11:0,-28.212,-137.064    0/0:132:119,4:119:1850:4:40:0,-33.4161,-162.954 0/0:110:96,5:96:1450:5:42:0,-26.6151,-126.785   0/0:132:121,7:121:1993:7:91:0,-30.2847,-171.225 0/0:119:109,4:109:1723:4:72:0,-27.4863,-148.627 0/0:114:106,3:106:1526:3:33:0,-29.822,-134.428  0/0:121:116,1:116:1825:1:10:0,-34.3175,-161.782 0/0:114:103,3:103:1532:3:34:0,-28.8445,-134.857 0/0:120:115,2:115:1840:2:7:0,-34.6178,-165.044  0/0:133:122,5:122:1906:5:56:0,-33.1551,-166.568 0/0:120:114,4:114:1884:4:36:0,-32.2382,-166.401 0/0:133:121,3:121:1785:3:18:0,-35.7119,-159.09  0/0:122:119,2:119:1941:2:9:0,-35.6246,-173.977
freebayes variant-frequency • 937 views
ADD COMMENT
0
Entering edit mode

I don't understand what is the " the frequency of each variant per sample." . There is only one genotype at the intersection of a variant of a sample. This genotype can contains ALT allele or not.

ADD REPLY
0
Entering edit mode

what I'm looking for is the number of reads which support a variant per sample. is it DP ?

ADD REPLY
0
Entering edit mode

so this is not "frequency of each variant per sample"

ADD REPLY
0
Entering edit mode

should I correct the title of my issue ?

ADD REPLY
0
Entering edit mode
ADD REPLY
1
Entering edit mode
7 months ago

what I'm looking for is the number of reads which support a variant per sample. is it DP ?

for each genotype you'll find FORMAT/AD:

##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">

it's an array of number or reads for REF / number of read for ALT.

you'll also find FORMAT/DP which is the number of reads per genotype

##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
ADD COMMENT
0
Entering edit mode

thank you Sir, One last question : should I extract these values before haplotype phasing or after, or does it not matter?

ADD REPLY
0
Entering edit mode
7 months ago
Jeremy ▴ 930

AF is allele frequency. You can see the VCF specification sheet here:

VCF

ADD COMMENT
0
Entering edit mode

thank u Sir but how it could be AF ? There is only one AF value per line. However, I am looking for the frequency of each variant per sample, so normally I should have as many AF values as samples (since I have 18 sample so I must have 18 values per variant/line ).

ADD REPLY

Login before adding your answer.

Traffic: 1947 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6