Entering edit mode
4.2 years ago
markgodek
▴
50
Hi,
I have some VCFs generated by Mutect 2. For each variant in these files, I want to get the allele frequency from dbSNP and 1k Genome VCFs I have.
Could you recommend a tool to do this?
I considered writing a python script to do it, but thought there is a better way than iterating over both dbSNP and 1k Genomes for every line in my Mutect 2 output.
Thanks.
Why don't you use Ensembl VEP to annotate your VCFs?
We decided to use GATK best practices so we're using their Funcotator for functional annotation, but the "Frequency data for co-located variants" function of VEP does look promising .Thanks.
Take a look at
bcftools annotate
. You may also want to check if your input is normalized (left aligned, parsimonious and multi-allelics split) before using bcftools annotate. You can do the pre-processing steps usingbcftools norm
orvt decompose
+vt normalize
Thanks. I'm new to all this so when vcftools and bcftools had given me errors about multiallelic sites, I just removed them with SelectVariants --restrict-alleles-to BIALLELIC
So after pre-processing with bcftools norm, I should be able to do something like this?
with annotations.hdr being something like
Pass a 100 variants through your bcftools annotate and verify a few - that way, you will know it works. You may want to use the
--collapse
parameter to make sure comparisons take CHROM, POS, REF and ALT into account to match the 2 VCFs.Thanks for your help. I was using bcftools annotate to change chromosome names in an earlier step, but didn't know it also had this function.
I've taking a deeper dive into bcftools and vcftools and it's really making my project easier.
Thanks again.