How can I remove duplicated variants from vcf file? I googled and searched in biostars history but I did not find any way to do it.
How can I remove duplicated variants from vcf file? I googled and searched in biostars history but I did not find any way to do it.
. in fact i intend to remove variants that are same in scoffold id and pos and keep one of them.
I strongly suggest you also use the REF information...
sort on CHROM/POS/REF. using awk create a KEY=CHROM\tPOS\REF, print the line if the key wasn't found previously
LC_ALL=C sort -t $'\t' -k1,1 -k2,2n -k4,4 input.vcf |\
awk -F '\t' '/^#/ {print;prev="";next;} {key=sprintf("%s\t%s\t%s",$1,$2,$4);if(key==prev) next;print;prev=key;}'
edit: added 'next; ' for VCF header.
thanks Pierre for your answer, i ran your cammand and get an vcf file as output but when used bcftools stats i got this error.
Failed to open output.vcf: unknown file type
why bcftools can not regognize output as a vcf file? i need to output file for downstream analysis as vcf file
ah yes, sorry it's because, sort messed-up the VCF header and ##fileformat=
is not anymore the first line.
please try:
( grep '^#' input.vcf ; grep -v "^#" input.vcf | LC_ALL=C sort -t $'\t' -k1,1 -k2,2n -k4,4 | awk -F '\t' 'BEGIN{ prev="";} {key=sprintf("%s\t%s\t%s",$1,$2,$4);if(key==prev) next;print;prev=key;}' ) > out.vcf
More options (just adding to keep threads linked based on common information): A: Remove duplicate SNPs only based on SNP ID in bcftools
Kevin
Do you know how to remove duplicated variants from vcf file(ALL.chr1.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz), I am struggling for it.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Take 1000 Genome phase 3 data as the example:
Terrible, still have duplicates: bcftools norm
Would you please explain more what do you mean by duplicated variants? Do you observe two lines in your VCF file that are exactly the same?
yes i mean is what you say and i want to keep one of the duplicate variants and remove the rest. in fact i intend to remove variants that are same in scoffold id and pos and keep one of them.