Question

frequency of each variant per sample

1

Entering edit mode

12 months ago

emilydolivo97 ▴ 10

Hello ,

I applied freebayes to my different samples, generated a VCF file, and annotated it.

I would like to know how I can determine the frequency of each variant per sample.

Does it the AO OR DP ? or something else !!

My vcf file looks like this:

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sxxxxxxxx9.fastq Sxxxxxxxx8.fastq Sxxxxxxxx3.fastq Sxxxxxxxx0.fastq Sxxxxxxx10.fastq Sxxxxxxx49.fastq Sxxxxxxxx5.fastq Sxxxxxxxx2.fastq Sxxxxxxxx7.fastq Sxxxxxxxx1.fastq Sxxxxxx341.fastq Sxxxxxx746.fastq Sxxxxxx887.fastq Sxxxxxxx72.fastq Sxxxxxx413.fastq Sxxxxxxx08.fastq Sxxxxxx494.fastq Sxxxxxxx84.fastq
DLXXXXX.4   687 .   C   T   4.38769E-13 .   AB=0;ABP=0;AC=0;AF=0;AN=36;AO=39;CIGAR=1X;DP=2179;DPB=2179;DPRA=0.986735;EPP=63.6445;EPPR=370.696;GTI=0;LEN=1;MEANALT=1.75;MQM=60;MQMR=59.9722;NS=18;NUMALT=1;ODDS=135.142;PAIRED=0;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=259;QR=40628;RO=2119;RPL=0;RPP=87.6977;RPPR=4604.36;RPR=39;RUN=1;SAF=3;SAP=63.6445;SAR=36;SRF=760;SRP=370.696;SRR=1359;TYPE=snp;technology.Nanopore=1;ANN=T|missense_variant|MODERATE|NAD9|Gene_190_750|transcript|ABF29491.1|protein_coding|1/1|c.497C>T|p.Pro166Leu|497/561|497/561|166/186||,T|upstream_gene_variant|MODIFIER|PvRXLR|Gene_818_1972|transcript|ABF29492.1|protein_coding||c.-132C>T|||||132|,T|upstream_gene_variant|MODIFIER|NAD5|Gene_2075_2280|transcript|ABF29493.1|protein_coding||c.-1389C>T|||||1389|WARNING_TRANSCRIPT_INCOMPLETE    GT:DP:AD:RO:QR:AO:QA:GL 0/0:116:114,2:114:2317:2:7:0,-34.2545,-208.029  0/0:118:112,6:112:2005:6:35:0,-32.3132,-176.731 0/0:127:124,2:124:2482:2:8:0,-37.2564,-222.749  0/0:125:122,2:122:2064:2:21:0,-35.4144,-183.924 0/0:119:117,2:117:2089:2:15:0,-34.3976,-186.748 0/0:105:102,1:102:2034:1:10:0,-30.0894,-182.259 0/0:136:130,5:130:2349:5:32:0,-37.7391,-208.61  0/0:107:103,1:103:2029:1:12:0,-30.2088,-181.628 0/0:124:119,3:119:2538:3:28:0,-34.1737,-225.986 0/0:119:116,1:116:2262:1:12:0,-34.1188,-202.579 0/0:113:111,2:111:1995:2:12:0,-32.8764,-178.568 0/0:126:125,0:125:2498:0:0:0,-37.6287,-224.944  0/0:114:110,3:110:2030:3:21:0,-32.0964,-180.914 0/0:119:117,0:117:2379:0:0:0,-35.2205,-214.238  0/0:135:129,4:129:2247:4:16:0,-38.587,-200.899  0/0:121:119,1:119:2436:1:6:0,-35.5636,-218.82   0/0:131:129,1:129:2527:1:9:0,-38.3089,-226.709  0/0:124:120,3:120:2347:3:15:0,-35.6842,-209.993
DLXXXXX.4   688 .   T   C   0.0 .   AB=0;ABP=0;AC=0;AF=0;AN=36;AO=139;CIGAR=1X;DP=2122;DPB=2122;DPRA=0;EPP=216.861;EPPR=211.476;GTI=0;LEN=1;MEANALT=1.77778;MQM=60;MQMR=59.9424;NS=18;NUMALT=1;ODDS=117.062;PAIRED=0;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=922;QR=30518;RO=1962;RPL=0;RPP=304.845;RPPR=4263.44;RPR=139;RUN=1;SAF=11;SAP=216.861;SAR=128;SRF=764;SRP=211.476;SRR=1198;TYPE=snp;technology.Nanopore=1;ANN=C|synonymous_variant|LOW|NAD9|Gene_190_750|transcript|ABF29491.1|protein_coding|1/1|c.498T>C|p.Pro166Pro|498/561|498/561|166/186||,C|upstream_gene_variant|MODIFIER|PvRXLR|Gene_818_1972|transcript|ABF29492.1|protein_coding||c.-131T>C|||||131|,C|upstream_gene_variant|MODIFIER|NAD5|Gene_2075_2280|transcript|ABF29493.1|protein_coding||c.-1388T>C|||||1388|WARNING_TRANSCRIPT_INCOMPLETE   GT:DP:AD:RO:QR:AO:QA:GL 0/0:112:103,8:103:1774:8:56:0,-28.3743,-154.704 0/0:112:102,9:102:1387:9:56:0,-28.3903,-119.483 0/0:125:118,5:118:1862:5:36:0,-33.7889,-164.406 0/0:117:111,6:111:1541:6:51:0,-30.5455,-134.141 0/0:116:105,8:105:1632:8:56:0,-28.9958,-141.906 0/0:100:95,5:95:1575:5:35:0,-26.883,-138.685    0/0:129:118,11:118:1771:11:74:0,-32.1056,-152.785   0/0:106:101,3:101:1478:3:22:0,-29.3011,-131.116 0/0:123:115,7:115:1920:7:44:0,-32.7719,-168.922 0/0:116:112,4:112:1792:4:26:0,-32.5145,-159.016 0/0:112:100,10:100:1416:10:65:0,-27.28,-121.655 0/0:124:118,6:118:1846:6:33:0,-34.3027,-163.038 0/0:113:97,15:97:1540:15:86:0,-26.0529,-130.944 0/0:118:112,5:112:1813:5:27:0,-32.7788,-160.836 0/0:130:117,11:117:1679:11:62:0,-33.0099,-145.591   0/0:118:110,7:110:1816:7:66:0,-29.2509,-157.556 0/0:129:121,8:121:1809:8:52:0,-34.0879,-158.206 0/0:122:107,11:107:1867:11:75:0,-28.7803,-161.359
DLXXXXX.4   689 .   G   A   7.66643E-14 .   AB=0;ABP=0;AC=0;AF=0;AN=36;AO=80;CIGAR=1X;DP=2172;DPB=2172;DPRA=0;EPP=60.4457;EPPR=363.926;GTI=0;LEN=1;MEANALT=2.61111;MQM=60;MQMR=59.944;NS=18;NUMALT=2;ODDS=113.649;PAIRED=0;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=491;QR=31152;RO=2017;RPL=0;RPP=176.728;RPPR=4382.87;RPR=80;RUN=1;SAF=63;SAP=60.4457;SAR=17;SRF=719;SRP=363.926;SRR=1298;TYPE=snp;technology.Nanopore=1;ANN=A|missense_variant|MODERATE|NAD9|Gene_190_750|transcript|ABF29491.1|protein_coding|1/1|c.499G>A|p.Val167Ile|499/561|499/561|167/186||,A|upstream_gene_variant|MODIFIER|PvRXLR|Gene_818_1972|transcript|ABF29492.1|protein_coding||c.-130G>A|||||130|,A|upstream_gene_variant|MODIFIER|NAD5|Gene_2075_2280|transcript|ABF29493.1|protein_coding||c.-1387G>A|||||1387|WARNING_TRANSCRIPT_INCOMPLETE    GT:DP:AD:RO:QR:AO:QA:GL 0/0:116:108,6:108:1864:6:35:0,-31.2324,-164.69  0/0:112:102,1:102:1407:1:6:0,-30.4676,-125.625  0/0:130:125,4:125:1886:4:25:0,-36.6049,-167.56  0/0:121:115,2:115:1587:2:21:0,-33.3197,-140.985 0/0:120:111,6:111:1620:6:35:0,-32.1183,-142.706 0/0:103:95,4:95:1533:4:23:0,-27.7357,-135.986   0/0:132:119,8:119:1850:8:39:0,-34.7647,-163.049 0/0:110:96,8:96:1450:8:50:0,-26.8143,-126.069   0/0:132:121,3:121:1993:3:12:0,-36.2692,-178.336 0/0:119:109,6:109:1723:6:32:0,-31.8144,-152.233 0/0:114:106,4:106:1526:4:18:0,-31.5246,-135.78  0/0:121:116,2:116:1825:2:9:0,-34.7402,-161.873  0/0:114:103,7:103:1532:7:74:0,-26.4452,-131.259 0/0:120:115,3:115:1840:3:31:0,-32.6906,-162.883 0/0:133:122,6:122:1906:6:20:0,-36.7747,-169.812 0/0:120:114,2:114:1884:2:5:0,-34.4911,-169.191  0/0:133:121,7:121:1785:7:50:0,-34.0334,-156.212 0/0:122:119,1:119:1941:1:6:0,-35.5786,-174.246
DLXXXXX.4   689 .   G   C   7.66643E-14 .   AB=0;ABP=0;AC=0;AF=0;AN=36;AO=59;CIGAR=1X;DP=2172;DPB=2172;DPRA=0;EPP=98.7391;EPPR=363.926;GTI=0;LEN=1;MEANALT=2.61111;MQM=60;MQMR=59.944;NS=18;NUMALT=2;ODDS=113.649;PAIRED=0;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=602;QR=31152;RO=2017;RPL=0;RPP=131.127;RPPR=4382.87;RPR=59;RUN=1;SAF=4;SAP=98.7391;SAR=55;SRF=719;SRP=363.926;SRR=1298;TYPE=snp;technology.Nanopore=1;ANN=C|missense_variant|MODERATE|NAD9|Gene_190_750|transcript|ABF29491.1|protein_coding|1/1|c.499G>C|p.Val167Leu|499/561|499/561|167/186||,C|upstream_gene_variant|MODIFIER|PvRXLR|Gene_818_1972|transcript|ABF29492.1|protein_coding||c.-130G>C|||||130|,C|upstream_gene_variant|MODIFIER|NAD5|Gene_2075_2280|transcript|ABF29493.1|protein_coding||c.-1387G>C|||||1387|WARNING_TRANSCRIPT_INCOMPLETE GT:DP:AD:RO:QR:AO:QA:GL 0/0:116:108,2:108:1864:2:29:0,-30.4817,-165.225 0/0:112:102,7:102:1407:7:57:0,-27.665,-121.037  0/0:130:125,1:125:1886:1:9:0,-37.1143,-168.997  0/0:121:115,2:115:1587:2:20:0,-33.4122,-141.075 0/0:120:111,2:111:1620:2:28:0,-31.4799,-143.332 0/0:103:95,2:95:1533:2:11:0,-28.212,-137.064    0/0:132:119,4:119:1850:4:40:0,-33.4161,-162.954 0/0:110:96,5:96:1450:5:42:0,-26.6151,-126.785   0/0:132:121,7:121:1993:7:91:0,-30.2847,-171.225 0/0:119:109,4:109:1723:4:72:0,-27.4863,-148.627 0/0:114:106,3:106:1526:3:33:0,-29.822,-134.428  0/0:121:116,1:116:1825:1:10:0,-34.3175,-161.782 0/0:114:103,3:103:1532:3:34:0,-28.8445,-134.857 0/0:120:115,2:115:1840:2:7:0,-34.6178,-165.044  0/0:133:122,5:122:1906:5:56:0,-33.1551,-166.568 0/0:120:114,4:114:1884:4:36:0,-32.2382,-166.401 0/0:133:121,3:121:1785:3:18:0,-35.7119,-159.09  0/0:122:119,2:119:1941:2:9:0,-35.6246,-173.977

freebayes variant-frequency • 1.3k views

ADD COMMENT • link updated 12 months ago by Pierre Lindenbaum 166k • written 12 months ago by emilydolivo97 ▴ 10

0

Entering edit mode

I don't understand what is the " the frequency of each variant per sample." . There is only one genotype at the intersection of a variant of a sample. This genotype can contains ALT allele or not.

ADD REPLY • link 12 months ago by Pierre Lindenbaum 166k

0

Entering edit mode

what I'm looking for is the number of reads which support a variant per sample. is it DP ?

ADD REPLY • link 12 months ago by emilydolivo97 ▴ 10

0

Entering edit mode

so this is not "frequency of each variant per sample"

ADD REPLY • link 12 months ago by Pierre Lindenbaum 166k

0

Entering edit mode

should I correct the title of my issue ?

ADD REPLY • link 12 months ago by emilydolivo97 ▴ 10

0

Entering edit mode

cross-posted: https://github.com/freebayes/freebayes/issues/792

ADD REPLY • link 12 months ago by Pierre Lindenbaum 166k

score 1 · Answer 1 · 2024-04-04

1

Entering edit mode

12 months ago

Pierre Lindenbaum 166k

what I'm looking for is the number of reads which support a variant per sample. is it DP ?

for each genotype you'll find FORMAT/AD:

##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">

it's an array of number or reads for REF / number of read for ALT.

you'll also find FORMAT/DP which is the number of reads per genotype

##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">

ADD COMMENT • link 12 months ago by Pierre Lindenbaum 166k

0

Entering edit mode

thank you Sir, One last question : should I extract these values before haplotype phasing or after, or does it not matter?

ADD REPLY • link 12 months ago by emilydolivo97 ▴ 10

score 0 · Answer 2 · 2024-04-03

0

Entering edit mode

12 months ago

Jeremy ▴ 930

AF is allele frequency. You can see the VCF specification sheet here:

VCF

ADD COMMENT • link 12 months ago by Jeremy ▴ 930

0

Entering edit mode

thank u Sir but how it could be AF ? There is only one AF value per line. However, I am looking for the frequency of each variant per sample, so normally I should have as many AF values as samples (since I have 18 sample so I must have 18 values per variant/line ).

ADD REPLY • link 12 months ago by emilydolivo97 ▴ 10