Entering edit mode
3.7 years ago
User000
▴
710
Dear all,
I have a particular vcf file like this,
chrX 29 . G A,T . PASS AC=1,1;AN=3 GT:DP:HF:CILOW:CIUP:SDP 0/1/2:4839:0.003,0.001:0.002,0.0:0.005,0.003:14;0,4;2
I tried various tools to split this, but I get the following results, so the FORMAT and INFO lines are identical.
chrX 29 . G A . PASS AC=1,1;AN=3;OLD_MULTIALLELIC=chrM:899:G/A/T GT:DP:HF:CILOW:CIUP:SDP 0/1/.:4839:0.003,0.001:0.002,0:0.005,0.003:14;0,4;2
chrX 29 . G T . PASS AC=1,1;AN=3;OLD_MULTIALLELIC=chrM:899:G/A/T GT:DP:HF:CILOW:CIUP:SDP 0/./1:4839:0.003,0.001:0.002,0:0.005,0.003:14;0,4;2
I need to get HF values split accordingly as well, like this:
chrX 29 . G A . PASS AC=1,1;AN=3 GT:DP:HF:CILOW:CIUP:SDP 0/1/2:4839:0.003:0.002,0.0:0.005,0.003:14;0,4;2
chrX 29 . G T . PASS AC=1,1;AN=3 GT:DP:HF:CILOW:CIUP:SDP 0/1/2:4839:0.001:0.002,0.0:0.005,0.003:14;0,4;2
How to do this?
EDIT: example.vcf
##fileformat=VCFv4.0
##reference=chrX
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Reads covering the REF position">
##FORMAT=<ID=HF,Number=.,Type=Float,Description="Heteroplasmy Frequency of variant allele">
##FORMAT=<ID=CILOW,Number=.,Type=Float,Description="Value defining the lower limit of the confidence interval of the heteroplasmy fraction">
##FORMAT=<ID=CIUP,Number=.,Type=Float,Description="Value defining the upper limit of the confidence interval of the heteroplasmy fraction">
##FORMAT=<ID=SDP,Number=.,Type=String,Description="Strand-specific read depth of the ALT allele">
##INFO=<ID=AC,Number=1,Type=Integer,Description="Allele count in genotypes">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample
chrX 886 . C A . PASS AC=1;AN=2 GT:DP:HF:CILOW:CIUP:SDP 0/1:4813:0.002:0.001:0.004:9;0
chrX 895 . C A . PASS AC=1;AN=2 GT:DP:HF:CILOW:CIUP:SDP 0/1:4813:0.001:0.0:0.003:5;0
chrX 898 . C A . PASS AC=1;AN=2 GT:DP:HF:CILOW:CIUP:SDP 0/1:4829:0.001:0.0:0.002:4;1
chrX 899 . G A,T . PASS AC=1,1;AN=3 GT:DP:HF:CILOW:CIUP:SDP 0/1/2:4839:0.003,0.001:0.002,0.0:0.005,0.003:14;0,4;2
chrX 900 . C T . PASS AC=1;AN=2 GT:DP:HF:CILOW:CIUP:SDP 0/1:4838:0.001:0.0:0.003:6;0
chrX 904 . C A . PASS AC=1;AN=2 GT:DP:HF:CILOW:CIUP:SDP 0/1:4860:0.004:0.003:0.006:14;6
chrX 909 . G T . PASS AC=1;AN=2 GT:DP:HF:CILOW:CIUP:SDP 0/1:4973:0.002:0.001:0.004:7;4
chrX 916 . C A . PASS AC=1;AN=2 GT:DP:HF:CILOW:CIUP:SDP 0/1:4998:0.001:0.0:0.002:4;1
chrX 917 . C A . PASS AC=1;AN=2 GT:DP:HF:CILOW:CIUP:SDP 0/1:4962:0.001:0.0:0.002:4;1
chrX 932 . C T . PASS AC=1;AN=2 GT:DP:HF:CILOW:CIUP:SDP 0/1:4884:0.001:0.0:0.003:4;2
chrX 933 . G A,T . PASS AC=1,1;AN=3 GT:DP:HF:CILOW:CIUP:SDP 0/1/2:4863:0.001,0.001:0.0,0.001:0.002,0.003:5;0,3;4
I usually use this tool - https://github.com/imgag/ngs-bits/blob/master/doc/tools/VcfBreakMulti.md
and that would be wrong because AC should have only one value and FORMAT/GT can't be '0/1/2'....
which one ? did you try
bcftools norm
?It is OK if the tool would give a split results for GT, what I was meaning is I dont care about GT at all in that case, i need to get HF values split. Yes, I tried bcftools norm, vt decompose, split.py (a custom python script I found here) and some others that I do not remember the names, but none worked so far.
can you try posting example vcf with headers?
I updated the question, thanks a lot
Thanks.