I am trying to do the variant calling on pooled samples. For that, I am using CRISP. I finished the variant calling and obtained a VCF with the following structure:
##fileformat=VCFv4.0
##fileDate=Fri Sep 9 10:20:58 2016
##source=CRISP_V0.1
##reference=dmel-all-chromosome-r5.39.fasta
##options: poolsize=2,bamfiles=14,qvoffset=33,bedfile=None,min_base_quality=13
##INFO=<ID=NP,Number=1,Type=Integer,Description="Number of Pools With Data">
##INFO=<ID=DP,Number=2,Type=Integer,Description="Total number of reads (+strand,-strand) across all pools (filtered reads only)">
##INFO=<ID=CT,Number=.,Type=Float,Description="contingency table p-value for each variant allele in same order as listed in column 5">
##INFO=<ID=VP,Number=.,Type=Integer,Description="Number of Pools with variant allele(s)">
##INFO=<ID=HP,Number=.,Type=Integer,Description="Ambiguity in positioning of indel (homopolymer run length or microsatellite length)">
##INFO=<ID=MQ,Number=.,Type=Integer,Description="# of reads with mapping qualities 0-9,10-19,20-39,40-255">
##FILTER=<ID=StrandBias,Description="strand bias in distribution of reads between reference and variant alleles on two strands">
##FILTER=<ID=LowDepth,Description="low average coverage: less than 1 (filtered) read per haplotype across all samples ">
##FILTER=<ID=LowMQ20,Description=" >20 percent of reads have mapping quality score less than 20">
##FILTER=<ID=LowMQ10,Description=" >10 percent of reads have mapping quality score less than 20">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=MLAC,Number=.,Type=Integer,Description="Maximum likelihood estimate for the allele counts for the ALT allele(s), in the same order as listed in column 5">
##FORMAT=<ID=AF,Number=.,Type=Float,Description="variant allele frequency in pool, one value for each variant allele listed in column 5">
##FORMAT=<ID=ADf,Number=.,Type=Integer,Description="Number of reads aligned to the forward strand of the genome supporting reference allele and the alternate alleles in the order listed">
##FORMAT=<ID=ADr,Number=.,Type=Integer,Description="Number of reads aligned to the reverse strand of the genome supporting reference allele and the alternate alleles in the order listed">
##FORMAT=<ID=ADb,Number=.,Type=Integer,Description="Number of overlapping paired-end reads (read from both strands) supporting reference allele and the alternate alleles in the order listed">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SRR1525694 SRR1525697 SRR1525768 SRR1525771 SRR1525774 SRR1525695 SRR1525698 SRR1525769 SRR1525772 SRR1525685 SRR1525696 SRR1525699 SRR1525770 SRR1525773
YHet 49 . C G 4074 LowDepth NP=14;DP=67,43,1;VT=SNV;CT=-0.0;VP=11;VF=EMpass;AC=1790;AF=0.99940;EMstats=407.46:-154.93;HWEstats=-0.0;MQS=0,2,83,29;FLANKSEQ=cattcatgtt:C:ccgctcttgc MLAC:GQ:DP:ADf:ADr .:12:1:0,1:0,0 .:9:8:0,4:0,4 .:13:20:0,10:0,10 .:14:7:0,2:0,4 .:12:5:0,4:0,1 .:13:15:0,9:0,5 .:10:8:0,5:0,3 .:11:29:0,18:0,11 .:10:2:0,2:0,0 .:13:5:0,1:0,3 .:13:7:0,5:0,1 .:10:0:0,0:0,0 .:9:4:0,4:0,0 .:13:3:0,2:0,1
YHet 50 . C T 4216 LowDepth NP=14;DP=69,43,1;VT=SNV;CT=-0.0;VP=11;VF=EMpass;AC=1790;AF=0.99945;EMstats=421.67:-162.93;HWEstats=-0.0;MQS=0,2,83,30;FLANKSEQ=attcatgttc:C:cgctcttgct MLAC:GQ:DP:ADf:ADr .:13:1:0,1:0,0 .:10:8:0,4:0,4 .:13:20:0,10:0,10 .:15:7:0,2:0,4 .:13:5:0,4:0,1 .:13:15:0,10:0,5 .:10:8:0,5:0,3 .:12:30:0,19:0,11 .:11:2:0,2:0,0 .:14:5:0,1:0,3 .:13:7:0,5:0,1 .:11:0:0,0:0,0 .:9:4:0,4:0,0 .:13:3:0,2:0,1
YHet 51 . C T 4248 LowDepth NP=14;DP=68,45,1;VT=SNV;CT=-0.0;VP=12;VF=EMpass;AC=1790;AF=0.99890;EMstats=424.84:-166.56;HWEstats=-0.0;MQS=0,2,86,30;FLANKSEQ=ttcatgttcc:C:gctcttgctt MLAC:GQ:DP:ADf:ADr .:10:1:0,1:0,0 .:7:8:0,4:0,4 .:10:20:0,10:0,10 .:12:7:0,2:0,4 .:10:5:0,4:0,1 .:10:15:0,9:0,5 .:7:8:0,5:0,3 .:9:31:0,19:0,11 .:8:3:0,2:0,1 .:11:6:0,1:0,4 .:10:7:0,5:0,1 .:8:0:0,0:0,0 .:6:4:0,4:0,0 .:10:3:0,2:0,1
YHet 195 . T A 12816 LowDepth NP=14;DP=143,194,5;VT=SNV;CT=-0.6;VP=13;VF=EMpass;AC=1789;AF=0.99645;EMstats=1281.69:-20.31;HWEstats=-0.0;MQS=0,0,203,143;FLANKSEQ=ccatttccta:T:taagagtaat MLAC:GQ:DP:ADf:ADr:ADb .:5:4:1,0:0,3:0,0 .:2:14:0,5:0,7:0,0 .:6:62:0,24:0,37:0,0 .:8:23:0,7:0,15:0,1 .:5:10:0,1:0,9:0,0 .:6:44:0,20:0,24:0,0 .:3:16:0,8:0,7:0,1 .:6:96:0,38:0,58:0,0 .:3:3:0,2:0,1:0,0 .:6:14:0,12:0,2:0,0 .:6:23:0,6:0,16:0,1 .:3:4:0,4:0,0:0,0 .:1:12:0,5:0,6:0,0 .:6:21:0,10:0,9:0,2
YHet 196 . T A 12627 LowDepth NP=14;DP=143,189,4;VT=SNV;CT=-0.0;VP=14;VF=EMpass;AC=1790;AF=0.99992;EMstats=1262.76:-548.38;HWEstats=-0.0;MQS=0,1,198,143;FLANKSEQ=catttcctat:T:aagagtaatt MLAC:GQ:DP:ADf:ADr:ADb .:22:4:0,1:0,3:0,0 .:18:14:0,5:0,7:0,0 .:23:62:0,24:0,36:0,0 .:24:23:0,7:0,15:0,1 .:22:10:0,1:0,9:0,0 .:23:43:0,20:0,23:0,0 .:19:17:0,8:0,7:0,1 .:22:94:0,38:0,56:0,0 .:19:3:0,2:0,1:0,0 .:23:14:0,12:0,2:0,0 .:22:22:0,6:0,15:0,1 .:20:4:0,4:0,0:0,0 .:18:12:0,5:0,6:0,0 .:22:20:0,10:0,9:0,1
YHet 1544 . A T 8596 LowDepth NP=14;DP=85,148,3;VT=SNV;CT=-0.9;VP=13;VF=EMpass;AC=1769;AF=0.98609;EMstats=859.60:-328.72;HWEstats=-0.0;MQS=0,4,84,156;FLANKSEQ=ttcaaatgat:A:aaaaaaaaca MLAC:GQ:DP:ADf:ADr:ADb .:1:11:0,6:0,4:0,0 .:1:11:0,4:0,5:0,0 .:1:40:0,9:4,25:0,1 .:1:9:0,3:0,6:0,0 .:1:13:0,4:0,9:0,0 .:1:25:0,9:0,16:0,0 .:0:19:0,11:0,7:0,0 .:2:57:0,15:0,42:0,0 .:0:3:0,0:0,3:0,0 .:0:10:1,1:0,7:0,0 .:1:15:0,8:0,7:0,0 .:0:7:0,4:0,3:0,0 .:0:8:0,2:0,5:0,1 .:1:16:0,8:0,5:0,1
YHet 1835 . C A 8903 LowDepth NP=14;DP=135,103,0;VT=SNV;CT=-0.1;VP=13;VF=EMpass;AC=1790;AF=0.99929;EMstats=890.31:-386.89;HWEstats=-0.0;MQS=0,5,181,63;FLANKSEQ=gaagagtaca:C:cagcatatcc MLAC:GQ:DP:ADf:ADr .:12:9:0,5:0,3 .:9:14:0,6:0,4 .:12:34:0,19:0,15 .:14:14:0,14:0,0 .:12:7:0,2:0,5 .:12:21:0,13:0,8 .:9:14:0,5:0,8 .:12:63:0,29:0,33 .:10:3:0,3:0,0 .:13:9:0,5:0,4 .:13:27:0,15:0,11 .:10:4:0,1:0,2 .:8:7:0,5:0,2 .:13:23:0,13:0,8
YHet 1844 . C A 8752 LowDepth NP=14;DP=141,100,0;VT=SNV;CT=-0.0;VP=14;VF=EMpass;AC=1790;AF=0.99990;EMstats=875.25:-343.76;HWEstats=-0.0;MQS=0,3,162,85;FLANKSEQ=accagcatat:C:ccgtgagtgg MLAC:GQ:DP:ADf:ADr .:21:9:0,5:0,4 .:17:10:0,5:0,2 .:21:35:0,22:0,13 .:23:13:0,12:0,0 .:20:6:0,2:0,4 .:21:21:0,13:0,8 .:18:14:0,4:0,8 .:20:70:0,34:0,35 .:18:4:0,3:0,1 .:22:9:0,6:0,3 .:21:28:0,16:0,11 .:18:3:0,1:0,2 .:17:6:0,4:0,2 .:21:22:0,14:0,7
Now I am going to write a script to transform this into a VCF with allele frequencies (because I haven't found one that does that), but I am not sure what I should use for that.
Should I sum the ADf and ADr of one of the alleles and divide by DP? What should I do when there are reads overlapping (ADb)?
Thanks!
Thanks for your answer! What about ADb?