Hi Everyone I am doing rnaseq variant calling on mouse, I am using the reference and indels and snps for mouse from here, the same source [ftp://ftp-mouse.sanger.ac.uk/]. indels file indels.dbSNP142 did not cause any issues with indel realigner, but snps snps.dbsnp142 file throws the following error with baserecabliration:
java -jar ${GATK}/GenomeAnalysisTK.jar \
-T BaseRecalibrator \
-R ${WHOLEGENOME} \
-I ${WHERE}/${CURRENT}-realigned.bam \
-knownSites ${DBSNP} \
-o ${WHERE}/${CURRENT}.recal_data.table
ERROR MESSAGE: Input files snps.dbSNP142.vcf and reference have incompatible contigs. Error details: The contig order in snps.dbSNP142.vcf and reference is not the same; to fix this please see: (https://www.broadinstitute.org/gatk/guide/article?id=1328), which describes reordering contigs in BAM and VCF files.. ##### ERROR snps.dbSNP142.vcf contigs = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, X, Y, MT] ##### ERROR reference contigs = [1, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 2, 3, 4, 5, 6, 7, 8, 9, MT, X, Y, JH584295.1, JH584292.1, GL456368.1, GL456396.1, GL456359.1, GL456382.1, GL456392.1, GL456394.1, GL456390.1, GL456387.1, GL456381.1, GL456370.1, GL456372.1, GL456389.1, GL456378.1, GL456360.1, GL456385.1, GL456383.1, GL456213.1, GL456239.1, GL456367.1, GL456366.1, GL456393.1, GL456216.1, GL456379.1, JH584304.1, GL456212.1, JH584302.1, JH584303.1, GL456210.1, GL456219.1, JH584300.1, JH584298.1, JH584294.1, GL456354.1, JH584296.1, JH584297.1, GL456221.1, JH584293.1, GL456350.1, GL456211.1, JH584301.1, GL456233.1, JH584299.1]
I tried this too as in the link in the error,:
java -jar ${PICARD}/picard.jar SortVcf \
I= ${DBSNP} \
O= sorted.vcf \
SEQUENCE_DICTIONARY= GRCm38_68.dict
But then i got:
Exception in thread "main" java.lang.IllegalArgumentException: java.lang.AssertionError: SAM dictionaries are not the same: SAMSequenceRecord(name=X,length=171031299,dict_index=19,assembly=null) was found when SAMSequenceRecord(name=MT,length=16299,dict_index=19,assembly=null) was expected. at picard.vcf.SortVcf.collectFileReadersAndHeaders(SortVcf.java:126) at picard.vcf.SortVcf.doWork(SortVcf.java:95) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:228) at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:94) at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:104) Caused by: java.lang.AssertionError: SAM dictionaries are not the same: SAMSequenceRecord(name=X,length=171031299,dict_index=19,assembly=null) was found when SAMSequenceRecord(name=MT,length=16299,dict_index=19,assembly=null) was expected. at htsjdk.samtools.SAMSequenceDictionary.assertSameDictionary(SAMSequenceDictionary.java:170) at picard.vcf.SortVcf.collectFileReadersAndHeaders(SortVcf.java:124) ... 4 more make: * [sortvcf] Error 1 sortvc
Any hint?
Thanks
When you did the original alignment you did not include
unplaced and unlocalized
contigs in your reference. The solution you linked to is only applicable when the sort order is wrong but there are no mismatches. I suppose you could remove lines with the offending references from your SNP reference.I did not understand this part <unplaced and="" unlocalized="">? And also you mean, I manually remove those extras stuff from the snp file? {JH584295.1, JH584292.1, GL456368.1, GL456396.1, GL456359.1, GL456382.1, GL456392.1, GL456394.1, GL456390.1, GL456387.1, GL456381.1, GL456370.1, GL456372.1, GL456389.1, GL456378.1, GL456360.1, GL456385.1, GL456383.1, GL456213.1, GL456239.1, GL456367.1, GL456366.1, GL456393.1, GL456216.1, GL456379.1, JH584304.1, GL456212.1, JH584302.1, JH584303.1, GL456210.1, GL456219.1, JH584300.1, JH584298.1, JH584294.1, GL456354.1, JH584296.1, JH584297.1, GL456221.1, JH584293.1, GL456350.1, GL456211.1, JH584301.1, GL456233.1, JH584299.1] Thanks
Like the human genome those GL* and JH* contigs are known to be present in the mouse genome but their precise location is not known. Did you delete the index file before running SortVcf?
No, I did not delete anything.
@Goutham says this in the post you linked above.
This is what I don't understand, which index they mean? The index I downloaded with the reference? The index of the snp file is deleted already.