Entering edit mode
6.2 years ago
SOHAIL
▴
410
Hi everyone,
I have a VCF file with multiple ambiguous ref/alt calls at some positions of the genome with ref allele type Y, R, M, K, S, W (i.e. two-base ambiguity codes). e.g.
3 60830534 . M C 101 . . GT:DP:A:C:G:T:PP:GQ 1/1:24:0,0:14,9:0,0:1,0:1038,0,808,782,114,898,883,114,101,806:101,
is there any way to remove them all from the VCF file?
Kind Regards sohail
@ATpoint Thanks for the reply!
Your command is working good for the type of VCF file where only variants are only called (i.e. at both columns of REF/ALT A/T/G/C should be present).
However, my VCF file is called with all genotypes of the genome "all-positions" (either homo ref or homo alt or het sites) together with ambiguous variant call set. and column 5 (ALT) of VCF might be filled with the period (i.e. dot symbol) e.g.
When i modified the command with following, ambiguous call is still there
am I doing any mistake?
Sorry, I do not get it. From the two lines above, the one where REF is M and the one with ALT
.
, which of these should be removed?@ATpoint, The lines with M (and others Y, R, W, K, S, (i.e. two-base ambiguity codes) ) in VCF file will be removed.
edit: any help???