What database to use for GATK baseRecalibration when using the ensembl hg19 genome
1
0
Entering edit mode
5 months ago

I am running gatk base recalibrator as part of a pipeline (sarek), and it requires --dbsnpor--known_indels. I am using the regular ensembl hg38.fa.gz (https://hgdownload2.soe.ucsc.edu/goldenPath/hg38/bigZips/). The number of files in ensembl genome download is overwhelming (https://hgdownload2.soe.ucsc.edu/downloads.html) and I don't know what to use for dbsnp that would correspond to this. Do you have any suggestions? Where can I learn about all the various versions and annotation files for the human genomes?

ensembl gatk • 342 views
ADD COMMENT
0
Entering edit mode
5 months ago
GenoMax 147k

While you have referred to hg19 in title you seem to be using hg38 (latest genome build) in the rest of the text. So assuming that you are interested in hg38 you can find the GATK resource bundle useful where you should download all files (including genome/indexes and SNP files): https://console.cloud.google.com/storage/browser/genomics-public-data/resources/broad/hg38/v0/

ADD COMMENT

Login before adding your answer.

Traffic: 2768 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6