Hi!
I have a VCF produced by MafFilter with 29 samples, with the next format (trimmed to 5 strains for easier reading):
##fileformat=VCFv4.0
##fileDate=202291
##source=Bio++
##FILTER=<ID=PASS,Description="All filters passed">
##FILTER=<ID=gap,Description="At least one sequence contains a gap">
##FILTER=<ID=unk,Description="At least one sequence contains an unresolved character">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Total number of alternate alleles in called genotypes">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Reference Strain01 Strain02 Strain03 Strain04 Strain05
chr09 191 . G A . PASS AC=1 GT 0 1 0 0 0 0
chr09 1229 . T C . PASS AC=1 GT 0 0 0 1 0 0
chr09 1233 . T G . PASS AC=1 GT 0 0 0 1 0 0
chr03 121013 . G T . PASS AC=29 GT 0 1 1 1 1 1
chr03 121017 . G A . PASS AC=29 GT 0 1 1 1 1 1
chr16 551745 . T A . PASS AC=28 GT 0 0 1 1 1 1
chr16 552420 . A G . PASS AC=26 GT 0 1 1 0 1 1
This VCF derives from a multiple genome alignment, where Reference is my reference genome, and Strain01 is a collection strain, the Strain02-29 are clones derived from Strain01, that were exposed to some mutagens.
I'd like to remove all the SNPs present in Strain01 from the rest of my strains.
I used the following bcftools command
bcftools view -e'AC=29' input.vcf.gz | bgzip -c > output.vcf.gz
This excludes all variants with AC=29
(meaning that the variants are present in the 29 strains). However, I have some cases where one or more strains don't have one or more SNP from Strain01 but the rest of the strains do (e.g. AC=26 or AC=28). I can set a threshold (e.g 20) and use:
bcftools view -e'AC>20' input.vcf.gz | bgzip -c > output.vcf.gz
But, it could be the case that some strains still carry SNPs present in Strain01.
I was thinking in split the VCF into individual VCF files for each strain and then use bcftools isec
or vcf-isec
, but I'd prefer work with the "full vcf"
Is there a tool or command where I can indicate Strain01 as my background and remove its contribution from all my strains?
Thank you in advance!