I am new to Bioinformatics therefore, I am sorry if my question is too basic but, I am following this article to filter my vcf file from 1000Genomes:
see method section 'Application to human chromosome 18 from the 1000 Genomes CEU sample'
They say in the section that they masked all sites flagged by the 1000 Genomes Project as being unfit for population genetic analyses. 1000Genomes has those sites in bed format and Fasta as well. The link is as follows:
My question is how can I use the bed file or Fasta files provided to mask those sites in a vcf file. Is there a tool that does it? Insights will be appreciated.
I think that you do the opposite i.e. mask a fasta file using a VCF file. You mask a fasta file (usually a reference genome) based on the variant positions in a VCF file so that any downstream analysis software avoid those sites. This should help: https://bedtools.readthedocs.io/en/latest/content/tools/maskfasta.html
Thank you @prasundutta87 for you reply! I came across vcftools that I think I can use. I'll try to use that first.