Hi,
I am trying to run GATK's BaseRecalibrator
on a BAM file created with the hg19 reference sequence downloaded from UCSC website.
For the --known-sites
option I would like to use either a gnomAD .vcf file or a dbSNP .vcf, downloaded from their respective websites.
The analysis works if I use the 00-common_all.vcf
file from dbSNP; however this file was created on hg38, and I cannot find the hg19 equivalent on their website.
The analysis does not work, on the other hand, when providing any gnomAD.vcf
file; the chromosome nomenclature on the reference does not correspond to the one in the .vcf
file, and the following error occurs:
A USER ERROR has occurred: Input files reference and features have incompatible contigs: No overlapping contigs found.
reference contigs = [chr1, chr2, chr3, chr4, chr5, chr6, chr7, chrX, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr20, chrY, chr19, chr22, chr21, chr6_ssto_hap7, chr6_mcf_hap5, chr6_cox_hap2, chr6_mann_hap4, chr6_apd_hap1, chr6_qbl_hap6, chr6_dbb_hap3, chr17_ctg5_hap1, chr4_ctg9_hap1, chr1_gl000192_random, chrUn_gl000225, chr4_gl000194_random, chr4_gl000193_random, chr9_gl000200_random, chrUn_gl000222, chrUn_gl000212, chr7_gl000195_random, chrUn_gl000223, chrUn_gl000224, chrUn_gl000219, chr17_gl000205_random, chrUn_gl000215, chrUn_gl000216, chrUn_gl000217, chr9_gl000199_random, chrUn_gl000211, chrUn_gl000213, chrUn_gl000220, chrUn_gl000218, chr19_gl000209_random, chrUn_gl000221, chrUn_gl000214, chrUn_gl000228, chrUn_gl000227, chr1_gl000191_random, chr19_gl000208_random, chr9_gl000198_random, chr17_gl000204_random, chrUn_gl000233, chrUn_gl000237, chrUn_gl000230, chrUn_gl000242, chrUn_gl000243, chrUn_gl000241, chrUn_gl000236, chrUn_gl000240, chr17_gl000206_random, chrUn_gl000232, chrUn_gl000234, chr11_gl000202_random, chrUn_gl000238, chrUn_gl000244, chrUn_gl000248, chr8_gl000196_random, chrUn_gl000249, chrUn_gl000246, chr17_gl000203_random, chr8_gl000197_random, chrUn_gl000245, chrUn_gl000247, chr9_gl000201_random, chrUn_gl000235, chrUn_gl000239, chr21_gl000210_random, chrUn_gl000231, chrUn_gl000229, chrM, chrUn_gl000226, chr18_gl000207_random, chr1_jh806574_fix, chr1_gl949741_fix, chr1_jh636053_fix, chr1_jh636052_fix, chr1_gl383518_alt, chr1_gl383519_alt, chr1_gl383520_alt, chr1_jh806573_fix, chr1_jh636054_fix, chr1_jh806575_fix, chr1_gl383516_fix, chr1_gl383517_fix, chr2_gl383521_alt, chr2_kb663603_fix, chr2_gl877871_fix, chr2_gl582966_alt, chr2_gl383522_alt, chr2_gl877870_fix, chr3_jh636055_alt, chr3_jh159132_fix, chr3_gl383523_fix, chr3_ke332495_fix, chr3_gl383524_fix, chr3_jh159131_fix, chr3_gl383525_fix, chr3_gl383526_alt, chr4_ke332496_fix, chr4_gl383528_alt, chr4_gl383529_alt, chr4_gl582967_fix, chr4_gl383527_alt, chr4_gl877872_fix, chr5_gl383532_alt, chr5_gl949742_alt, chr5_gl339449_alt, chr5_gl383530_alt, chr5_jh159133_fix, chr5_ke332497_fix, chr5_gl383531_alt, chr6_jh806576_fix, chr6_jh636057_fix, chr6_gl383533_alt, chr6_kb663604_fix, chr6_jh636056_fix, chr6_ke332498_fix, chr6_kb021644_alt, chr7_gl582970_fix, chr7_gl582969_fix, chr7_ke332499_fix, chr7_jh159134_fix, chr7_gl582972_fix, chr7_gl582968_fix, chr7_jh636058_fix, chr7_gl383534_alt, chr7_gl582971_fix, chr8_gl949743_fix, chr8_ke332500_fix, chr8_jh159135_fix, chr8_gl383535_fix, chr8_gl383536_fix, chr9_gl383539_alt, chr9_jh636059_fix, chr9_gl383540_alt, chr9_gl383541_alt, chr9_gl383542_alt, chr9_kb663605_fix, chr9_jh806579_fix, chr9_gl339450_fix, chr9_jh806577_fix, chr9_jh806578_fix, chr9_gl383537_fix, chr9_gl383538_fix, chr10_gl877873_fix, chr10_jh636060_fix, chr10_gl383543_fix, chr10_gl383545_alt, chr10_gl383546_alt, chr10_jh591181_fix, chr10_kb663606_fix, chr10_jh591183_fix, chr10_ke332501_fix, chr10_jh591182_fix, chr10_gl383544_fix, chr10_jh806580_fix, chr11_jh591184_fix, chr11_jh591185_fix, chr11_gl383547_alt, chr11_gl582973_fix, chr11_jh159136_alt, chr11_jh159137_alt, chr11_gl949744_fix, chr11_jh806581_fix, chr11_jh159143_fix, chr11_jh159141_fix, chr11_jh159139_fix, chr11_jh159142_fix, chr11_jh159140_fix, chr11_jh720443_fix, chr11_jh159138_fix, chr12_gl582974_fix, chr12_gl877875_alt, chr12_jh720444_fix, chr12_gl949745_alt, chr12_gl877876_alt, chr12_gl383549_alt, chr12_gl383550_alt, chr12_gl383552_alt, chr12_gl383553_alt, chr12_kb663607_fix, chr12_gl383551_alt, chr12_gl383548_fix, chr13_gl582975_fix, chr14_kb021645_fix, chr15_gl383554_alt, chr15_gl383555_alt, chr15_jh720445_fix, chr16_gl383556_alt, chr16_jh720446_fix, chr16_gl383557_alt, chr17_jh806582_fix, chr17_gl383563_alt, chr17_gl383562_fix, chr17_gl383561_fix, chr17_ke332502_fix, chr17_jh159145_fix, chr17_kb021646_fix, chr17_gl383560_fix, chr17_gl383559_fix, chr17_jh159146_alt, chr17_jh159148_alt, chr17_jh159147_alt, chr17_gl383564_alt, chr17_gl582976_fix, chr17_jh720447_fix, chr17_gl383558_fix, chr17_jh159144_fix, chr17_gl383565_alt, chr17_gl383566_alt, chr17_jh591186_fix, chr17_jh636061_fix, chr18_gl383567_alt, chr18_gl383570_alt, chr18_gl383571_alt, chr18_gl383568_alt, chr18_gl383569_alt, chr18_gl383572_alt, chr19_jh159149_fix, chr19_gl582977_fix, chr19_gl383573_alt, chr19_gl383575_alt, chr19_gl383576_alt, chr19_gl383574_alt, chr19_ke332505_fix, chr19_kb021647_fix, chr19_gl949746_alt, chr19_gl949747_alt, chr19_gl949748_alt, chr19_gl949749_alt, chr19_gl949750_alt, chr19_gl949751_alt, chr19_gl949752_alt, chr19_gl949753_alt, chr20_gl383577_alt, chr20_jh720448_fix, chr20_kb663608_fix, chr20_gl582979_fix, chr21_gl383578_alt, chr21_gl383579_alt, chr21_gl383580_alt, chr21_gl383581_alt, chr21_ke332506_fix, chr22_jh806584_fix, chr22_jh806583_fix, chr22_jh806585_fix, chr22_jh720449_fix, chr22_gl383583_alt, chr22_gl383582_alt, chr22_kb663609_alt, chr22_jh806586_fix, chrX_gl877877_fix, chrX_jh720451_fix, chrX_jh720452_fix, chrX_jh806589_fix, chrX_kb021648_fix, chrX_jh806590_fix, chrX_jh806587_fix, chrX_jh806591_fix, chrX_jh806592_fix, chrX_jh720453_fix, chrX_jh720454_fix, chrX_jh806593_fix, chrX_jh806594_fix, chrX_jh806595_fix, chrX_jh720455_fix, chrX_jh806588_fix, chrX_jh806601_fix, chrX_jh806602_fix, chrX_jh806603_fix, chrX_jh806596_fix, chrX_jh806597_fix, chrX_jh806598_fix, chrX_jh806599_fix, chrX_jh806600_fix, chrX_jh159150_fix, chrMT]
features contigs = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X, Y]
Does anyone have any input .vcf file from either database correctly formatted to be used for this purpose on hg19? How would you proceed?
Thanks a lot for any input.
The dbSNP file should do the trick. Thanks a lot!
I've moved my comment to an answer. Can you please accept it to mark the post resolved?
It was just I want to know. Thank you, Ram for telling us the information and also thanks Jordi for asking the topics. I faced the same trouble to Jordi's and my trouble was solved by using 00-All.vcf.gz in gatk folder.
Regarding these options:
How do we lose information by renaming
chrM
->MT
?While GRCh37p13 is not 100% similar to hg19, I don't think it makes a huge difference. While I can't assure that information won't be lost, the odds of losing information are really low. At some point, it boils down to how much you trust the tool. Plus, "hg19" is not unambiguous as well, as it boils down to where it was downloaded from.
Thank you for the response. You are right about the chrM -> MT, I meant all the others such as chrUn_gl000238 and simmilars. Aren't these important for the analysis?
Answer is "may be". You need to decide that as the end user. For most people unplaced contigs, haplotypes do not make a difference in their experiments. But if you happen to be working specifically on haplotypes they it would make a big difference.