Hello
I am using the GATK pipeline to do variant calling from human patients. For the purpose of tools in the GATK pipeline that require a list of known sites (e.g. BaseCallibrator) I am using the b37 dbSNP versuon 137 vcf available from the GATK resource bundle. Following variant calling I am annotating the variants using bcftools annotate.
For the purpose of variant annotation I wish to use the most recent version of dbsnp possible (version 151 at the time of writing). So I have two questions:
Does using a certain version of dbsnp (e.g. 137) for variant calling also means that I should use this version for annotation or can I use a later version without expecting trouble?
I am unaware of a vcf of dbsnp151 which is compatible with the b37 build. Can I simply use the dbsnp vcf for GRCh37 offered by NCBI or is this likely to cause problems?
Thanks in advance
My question was if I can use different dbSNP files in the variant calling versus the annotation step, not between BQSR and the variant calling. The reason that I want to use different files in these steps is that for the variant calling I want to stick to the dbSNP vcf files provided by GATK and these only go as far as dbSNP 138. For the annotation file I want to use the most recent version of dbSNP in order to be able to extract information on as many variants as possible. Thanks for the reminder about the chromosome naming issue. Somehow forgot about that :)
Hello,
the only thing that GATK do with the dbSNP file during variant calling is to annotate the ID column. So it doesn't make sense to use here an old version and in a separate annotation step a newer version. But if you really want to do this, you can do, as the dbSNP file isn't use for any calculation during variant calling (see manual).
fin swimmer