What's is a quick way to unphase all the genotypes in a VCF file? i.e. I want all the GT values to be of the form x/y (instead of x|y)
What's is a quick way to unphase all the genotypes in a VCF file? i.e. I want all the GT values to be of the form x/y (instead of x|y)
This sed one-liner in BASH appears to work for me:
sed '/^##/! s/|/\//g' INPUT.vcf > OUTPUT.vcf
...or to replace directy in the file without creating a new one, use sed -i ...
[tested on linux / Ubuntu 16.04]
The first part of the sed command (^##/!) means that it won't replace pipe symbols found in the VCF header. I can't imagine that pipe symbols would be used anywhere else in the VCF main body, other than [possibly] when an annotation program adds custom annotation to the INFO column.
Another possibility would be to use awk in BASH in order to specifically change values in a particular column, but this would get cumbersome with multi-sample VCFs.
Kevin
using vcffilterjdk: http://lindenb.github.io/jvarkit/VcfFilterJdk.html
java -jar dist/vcffilterjdk.jar -e 'return new VariantContextBuilder(variant).genotypes(variant.getGenotypes().stream().map(G->new GenotypeBuilder(G).phased(false).make()).collect(Collectors.toList())).make();' input.vcf
tested with:
wget -O - "https://github.com/vcflib/vcflib/blob/master/samples/scaffold612.phased.vcf?raw=true" | java -jar dist/vcffilterjdk.jar -e 'return new VariantContextBuilder(variant).genotypes(variant.getGenotypes().stream().map(G->new GenotypeBuilder(G).phased(false).make()).collect(Collectors.toList())).make();'
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
An awk approach which only effects FMT columns...
awk -F $'\t' '\ BEGIN {OFS = FS} /^[#]/ {print; next} { for (i = 10; i<=NF; i++) { gsub("\|","/",$i) } print }'
the above didn't work for me, I had to modify it to: