GATK BaseRecalibrator known-sites vcf file
1
4
Entering edit mode
3.4 years ago
Jordi ▴ 60

Hi,

I am trying to run GATK's BaseRecalibrator on a BAM file created with the hg19 reference sequence downloaded from UCSC website.

For the --known-sites option I would like to use either a gnomAD .vcf file or a dbSNP .vcf, downloaded from their respective websites.

The analysis works if I use the 00-common_all.vcf file from dbSNP; however this file was created on hg38, and I cannot find the hg19 equivalent on their website.

The analysis does not work, on the other hand, when providing any gnomAD.vcf file; the chromosome nomenclature on the reference does not correspond to the one in the .vcf file, and the following error occurs:

A USER ERROR has occurred: Input files reference and features have incompatible contigs: No overlapping contigs found.
  reference contigs = [chr1, chr2, chr3, chr4, chr5, chr6, chr7, chrX, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr20, chrY, chr19, chr22, chr21, chr6_ssto_hap7, chr6_mcf_hap5, chr6_cox_hap2, chr6_mann_hap4, chr6_apd_hap1, chr6_qbl_hap6, chr6_dbb_hap3, chr17_ctg5_hap1, chr4_ctg9_hap1, chr1_gl000192_random, chrUn_gl000225, chr4_gl000194_random, chr4_gl000193_random, chr9_gl000200_random, chrUn_gl000222, chrUn_gl000212, chr7_gl000195_random, chrUn_gl000223, chrUn_gl000224, chrUn_gl000219, chr17_gl000205_random, chrUn_gl000215, chrUn_gl000216, chrUn_gl000217, chr9_gl000199_random, chrUn_gl000211, chrUn_gl000213, chrUn_gl000220, chrUn_gl000218, chr19_gl000209_random, chrUn_gl000221, chrUn_gl000214, chrUn_gl000228, chrUn_gl000227, chr1_gl000191_random, chr19_gl000208_random, chr9_gl000198_random, chr17_gl000204_random, chrUn_gl000233, chrUn_gl000237, chrUn_gl000230, chrUn_gl000242, chrUn_gl000243, chrUn_gl000241, chrUn_gl000236, chrUn_gl000240, chr17_gl000206_random, chrUn_gl000232, chrUn_gl000234, chr11_gl000202_random, chrUn_gl000238, chrUn_gl000244, chrUn_gl000248, chr8_gl000196_random, chrUn_gl000249, chrUn_gl000246, chr17_gl000203_random, chr8_gl000197_random, chrUn_gl000245, chrUn_gl000247, chr9_gl000201_random, chrUn_gl000235, chrUn_gl000239, chr21_gl000210_random, chrUn_gl000231, chrUn_gl000229, chrM, chrUn_gl000226, chr18_gl000207_random, chr1_jh806574_fix, chr1_gl949741_fix, chr1_jh636053_fix, chr1_jh636052_fix, chr1_gl383518_alt, chr1_gl383519_alt, chr1_gl383520_alt, chr1_jh806573_fix, chr1_jh636054_fix, chr1_jh806575_fix, chr1_gl383516_fix, chr1_gl383517_fix, chr2_gl383521_alt, chr2_kb663603_fix, chr2_gl877871_fix, chr2_gl582966_alt, chr2_gl383522_alt, chr2_gl877870_fix, chr3_jh636055_alt, chr3_jh159132_fix, chr3_gl383523_fix, chr3_ke332495_fix, chr3_gl383524_fix, chr3_jh159131_fix, chr3_gl383525_fix, chr3_gl383526_alt, chr4_ke332496_fix, chr4_gl383528_alt, chr4_gl383529_alt, chr4_gl582967_fix, chr4_gl383527_alt, chr4_gl877872_fix, chr5_gl383532_alt, chr5_gl949742_alt, chr5_gl339449_alt, chr5_gl383530_alt, chr5_jh159133_fix, chr5_ke332497_fix, chr5_gl383531_alt, chr6_jh806576_fix, chr6_jh636057_fix, chr6_gl383533_alt, chr6_kb663604_fix, chr6_jh636056_fix, chr6_ke332498_fix, chr6_kb021644_alt, chr7_gl582970_fix, chr7_gl582969_fix, chr7_ke332499_fix, chr7_jh159134_fix, chr7_gl582972_fix, chr7_gl582968_fix, chr7_jh636058_fix, chr7_gl383534_alt, chr7_gl582971_fix, chr8_gl949743_fix, chr8_ke332500_fix, chr8_jh159135_fix, chr8_gl383535_fix, chr8_gl383536_fix, chr9_gl383539_alt, chr9_jh636059_fix, chr9_gl383540_alt, chr9_gl383541_alt, chr9_gl383542_alt, chr9_kb663605_fix, chr9_jh806579_fix, chr9_gl339450_fix, chr9_jh806577_fix, chr9_jh806578_fix, chr9_gl383537_fix, chr9_gl383538_fix, chr10_gl877873_fix, chr10_jh636060_fix, chr10_gl383543_fix, chr10_gl383545_alt, chr10_gl383546_alt, chr10_jh591181_fix, chr10_kb663606_fix, chr10_jh591183_fix, chr10_ke332501_fix, chr10_jh591182_fix, chr10_gl383544_fix, chr10_jh806580_fix, chr11_jh591184_fix, chr11_jh591185_fix, chr11_gl383547_alt, chr11_gl582973_fix, chr11_jh159136_alt, chr11_jh159137_alt, chr11_gl949744_fix, chr11_jh806581_fix, chr11_jh159143_fix, chr11_jh159141_fix, chr11_jh159139_fix, chr11_jh159142_fix, chr11_jh159140_fix, chr11_jh720443_fix, chr11_jh159138_fix, chr12_gl582974_fix, chr12_gl877875_alt, chr12_jh720444_fix, chr12_gl949745_alt, chr12_gl877876_alt, chr12_gl383549_alt, chr12_gl383550_alt, chr12_gl383552_alt, chr12_gl383553_alt, chr12_kb663607_fix, chr12_gl383551_alt, chr12_gl383548_fix, chr13_gl582975_fix, chr14_kb021645_fix, chr15_gl383554_alt, chr15_gl383555_alt, chr15_jh720445_fix, chr16_gl383556_alt, chr16_jh720446_fix, chr16_gl383557_alt, chr17_jh806582_fix, chr17_gl383563_alt, chr17_gl383562_fix, chr17_gl383561_fix, chr17_ke332502_fix, chr17_jh159145_fix, chr17_kb021646_fix, chr17_gl383560_fix, chr17_gl383559_fix, chr17_jh159146_alt, chr17_jh159148_alt, chr17_jh159147_alt, chr17_gl383564_alt, chr17_gl582976_fix, chr17_jh720447_fix, chr17_gl383558_fix, chr17_jh159144_fix, chr17_gl383565_alt, chr17_gl383566_alt, chr17_jh591186_fix, chr17_jh636061_fix, chr18_gl383567_alt, chr18_gl383570_alt, chr18_gl383571_alt, chr18_gl383568_alt, chr18_gl383569_alt, chr18_gl383572_alt, chr19_jh159149_fix, chr19_gl582977_fix, chr19_gl383573_alt, chr19_gl383575_alt, chr19_gl383576_alt, chr19_gl383574_alt, chr19_ke332505_fix, chr19_kb021647_fix, chr19_gl949746_alt, chr19_gl949747_alt, chr19_gl949748_alt, chr19_gl949749_alt, chr19_gl949750_alt, chr19_gl949751_alt, chr19_gl949752_alt, chr19_gl949753_alt, chr20_gl383577_alt, chr20_jh720448_fix, chr20_kb663608_fix, chr20_gl582979_fix, chr21_gl383578_alt, chr21_gl383579_alt, chr21_gl383580_alt, chr21_gl383581_alt, chr21_ke332506_fix, chr22_jh806584_fix, chr22_jh806583_fix, chr22_jh806585_fix, chr22_jh720449_fix, chr22_gl383583_alt, chr22_gl383582_alt, chr22_kb663609_alt, chr22_jh806586_fix, chrX_gl877877_fix, chrX_jh720451_fix, chrX_jh720452_fix, chrX_jh806589_fix, chrX_kb021648_fix, chrX_jh806590_fix, chrX_jh806587_fix, chrX_jh806591_fix, chrX_jh806592_fix, chrX_jh720453_fix, chrX_jh720454_fix, chrX_jh806593_fix, chrX_jh806594_fix, chrX_jh806595_fix, chrX_jh720455_fix, chrX_jh806588_fix, chrX_jh806601_fix, chrX_jh806602_fix, chrX_jh806603_fix, chrX_jh806596_fix, chrX_jh806597_fix, chrX_jh806598_fix, chrX_jh806599_fix, chrX_jh806600_fix, chrX_jh159150_fix, chrMT]
  features contigs = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X, Y]

Does anyone have any input .vcf file from either database correctly formatted to be used for this purpose on hg19? How would you proceed?

Thanks a lot for any input.

gnomad baserecalibrator gatk • 7.0k views
ADD COMMENT
6
Entering edit mode
3.4 years ago
Ram 44k

You have a good grasp on the exact problems you're facing. Here are a few options for you to consider:

  1. Rename chromosomes in gnomAD using bcftools annotate --rename-chrs
  2. Get an hg19 dbSNP VCF file - it should be available, just dig deeper. EDIT: Took just a few extra seconds of navigating the link you pasted to get to this directory: https://ftp.ncbi.nih.gov/snp/organisms/human_9606_b150_GRCh37p13/VCF/
ADD COMMENT
0
Entering edit mode

The dbSNP file should do the trick. Thanks a lot!

ADD REPLY
0
Entering edit mode

I've moved my comment to an answer. Can you please accept it to mark the post resolved?

ADD REPLY
3
Entering edit mode

It was just I want to know. Thank you, Ram for telling us the information and also thanks Jordi for asking the topics. I faced the same trouble to Jordi's and my trouble was solved by using 00-All.vcf.gz in gatk folder.

ADD REPLY
0
Entering edit mode

Regarding these options:

  1. Don't we lose information by just renaming the CHR (e.g mitochondrial info) ?
  2. The link you provided is from GRCh37p13 which is not 100% simmilar hg19 which is the reference he is using. https://gatk.broadinstitute.org/hc/en-us/articles/360035890711-GRCh37-hg19-b37-humanG1Kv37-Human-Reference-Discrepancies#comparison so again wouldn't Jordi lose information?
ADD REPLY
0
Entering edit mode

How do we lose information by renaming chrM -> MT?

While GRCh37p13 is not 100% similar to hg19, I don't think it makes a huge difference. While I can't assure that information won't be lost, the odds of losing information are really low. At some point, it boils down to how much you trust the tool. Plus, "hg19" is not unambiguous as well, as it boils down to where it was downloaded from.

ADD REPLY
0
Entering edit mode

Thank you for the response. You are right about the chrM -> MT, I meant all the others such as chrUn_gl000238 and simmilars. Aren't these important for the analysis?

ADD REPLY
2
Entering edit mode

Aren't these important the analysis?

Answer is "may be". You need to decide that as the end user. For most people unplaced contigs, haplotypes do not make a difference in their experiments. But if you happen to be working specifically on haplotypes they it would make a big difference.

ADD REPLY

Login before adding your answer.

Traffic: 2074 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6