Can I use different versions of dbsnp for variant calling using GATK and variant annotation?
1
1
Entering edit mode
6.6 years ago
dolevrahat ▴ 40

Hello

I am using the GATK pipeline to do variant calling from human patients. For the purpose of tools in the GATK pipeline that require a list of known sites (e.g. BaseCallibrator) I am using the b37 dbSNP versuon 137 vcf available from the GATK resource bundle. Following variant calling I am annotating the variants using bcftools annotate.

For the purpose of variant annotation I wish to use the most recent version of dbsnp possible (version 151 at the time of writing). So I have two questions:

  1. Does using a certain version of dbsnp (e.g. 137) for variant calling also means that I should use this version for annotation or can I use a later version without expecting trouble?

  2. I am unaware of a vcf of dbsnp151 which is compatible with the b37 build. Can I simply use the dbsnp vcf for GRCh37 offered by NCBI or is this likely to cause problems?

Thanks in advance

dbsnp gatk variant calling variant annotation • 2.4k views
ADD COMMENT
3
Entering edit mode
6.5 years ago

Hello dolevrahat,

it should not be a problem to take different vcf file in BQSR and variant calling, as the variant caller doesn't know about any step you do with your data before. The questions is why you want to use different files?

To your second question: In most of the cases one can say NCBI build 37 = GRCh37 = hg19. You only have to take attention on the name convention of the chromosomes. hg19 from UCSC prefixed them with "chr", the other doesn't. This might be a problem in your pipeline. Some programs are doing some magic to handle this difference, for others you first have to correct the naming.

fin swimmer

ADD COMMENT
1
Entering edit mode

My question was if I can use different dbSNP files in the variant calling versus the annotation step, not between BQSR and the variant calling. The reason that I want to use different files in these steps is that for the variant calling I want to stick to the dbSNP vcf files provided by GATK and these only go as far as dbSNP 138. For the annotation file I want to use the most recent version of dbSNP in order to be able to extract information on as many variants as possible. Thanks for the reminder about the chromosome naming issue. Somehow forgot about that :)

ADD REPLY
1
Entering edit mode

Hello,

the only thing that GATK do with the dbSNP file during variant calling is to annotate the ID column. So it doesn't make sense to use here an old version and in a separate annotation step a newer version. But if you really want to do this, you can do, as the dbSNP file isn't use for any calculation during variant calling (see manual).

fin swimmer

ADD REPLY

Login before adding your answer.

Traffic: 1598 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6