Hey guys,
I'm having a problem when using a mouse dbsnp vcf file for variant annotations with GATK's VariantAnnotator. This dbsnp vcf file has chromosomes notated as chr1, chr2 .... but my reference follows a different notation 1, 2, 3 ... This produces an error when running GATK, like the following:
Input files dbSNP.vcf and reference have incompatible contigs: No overlapping contigs found.
<h5>ERROR dbSNP.vcf contigs = [chr1, chr10, chr11, chr12, chr13, chr13random, chr14, chr15, chr16, chr17, chr17random, chr18, chr19, chr1random, chr2, chr3, chr3random, chr4, chr4random, chr5, chr5random, chr6, chr7, chr7random, chr8, chr8random, chr9, chr9random, chrM, chrUnrandom, chrX, chrXrandom, chrY, chrYrandom]</h5> <h5>ERROR reference contigs = [1, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 2, 3, 4, 5, 6, 7, 8, 9, MT, X, Y, NT166325, NT166464, NT166452, NT166480, NT166448, NT166458, NT166443, NT166466, NT166476, NT166479, NT166478, NT166474, NT166471, NT166445, NT166465, NT166457, NT166470, NT166454, NT166472, NT166449, NT166481, NT166337, NT166459, NT166456, NT166473, NT166461, NT166475, NT166462, NT166444, NT166453, NT166446, NT166469, NT072868, NT166335, NT166467, NT166283, NT166338, NT166340, NT166442, NT166334, NT166286, NT166451, NT166336, NT166339, NT166290, NT053651, NT166450, NT166447, NT166468, NT166460, NT166477, NT166455, NT166291, NT166463, NT166433, NT166402, NT166327, NT166308, NT166309, NT109319, NT166282, NT166314, NT166303, NT112000, NT110857, NT166280, NT166375, NT166311, NT166307, NT166310, NT166323, NT166437, NT166374, NT166364, NT166439, NT166328, NT166438, NT166389, NT162750, NT166436, NT166372, NT166440, NT166326, NT166342, NT166333, NT166435, NT166434, NT166341, NT166376, NT166387, NT166281, NT166313, NT166380, NT166360, NT166441, NT166359, NT166386, NT166356, NT166357, NT166423, NT166384, NT161879, NT161928, NT166388, NT161919, NT166381, NT166367, NT166392, NT166406, NT166365, NT166379, NT166358, NT161913, NT166378, NT166382, NT161926, NT166345, NT166385, NT165789, NT166368, NT166405, NT166390, NT166373, NT166361, NT166348, NT166369, NT161898, NT166417, NT166410, NT166383, NT166362, NT165754, NT166366, NT166363, NT161868, NT166407, NT165793, NT166352, NT161925, NT166412, NT165792, NT161924, NT166422, NT165795, NT166354, NT166350, NT165796, NT161904, NT166370, NT165798, NT165791, NT161885, NT166424, NT166346, NT165794, NT166377, NT166418, NT161877, NT166351, NT166408, NT166349, NT161906, NT166391, NT161892, NT166415, NT165790, NT166420, NT166353, NT166344, NT166371, NT161895, NT166404, NT166413, NT166419, NT161916, NT166347, NT161875, NT161911, NT161897, NT161866, NT166409, NT161872, NT166403, NT161902, NT166414, NT166416, NT166421, NT161923, NT_161937]</h5>Is there any simple way/script that would allow me to change the chromosome notation on the dbsnp file to the same provided on the reference?
Thanks
hi, leandro,
I am doing sequencing with mouse too. however, when I use GATK and found that my dbSNP.vcf has the different chromosome order with reference data and my vcf file has chromosomes notated as 1, 2 3 but reference data has chr1, chr2 and chr3, totally opposite from yours. where do you download dbsnp? I downloaded dbsnp from sanger and reference data from Illunmina.
Besides, did you index reference sequence by BWA? I downloaded reference data from ILlunmina because it has been indexed already.