Problem: I am trying to use MuTect to analyze some exome capture data for paired tumor and normal samples. I am working with Ion Torrent Proton data.
Basically, MuTect requires .bam files to be ordered in karyotypic order (just like GATK).
If I try to run MuTect on the .bam files produced by TMAP, I get the following error message:
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 2.2-25-g2a68eab):
##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
##### ERROR Please do not post this error to the GATK forum
##### ERROR
##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
##### ERROR Visit our website and forum for extensive documentation and answers to
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: Lexicographically sorted human genome sequence detected in reference.
##### ERROR For safety's sake the GATK requires human contigs in karyotypic order: 1, 2, ..., 10, 11, ..., 20, 21, 22, X, Y with M either leading or trailing these contigs.
##### ERROR This is because all distributed GATK resources are sorted in karyotypic order, and your processing will fail when you need to use these files.
##### ERROR You can use the ReorderSam utility to fix this problem: http://www.broadinstitute.org/gsa/wiki/index.php/ReorderSam
##### ERROR reference contigs = [chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr1, chr20, chr21, chr22, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chrM, chrX, chrY]
##### ERROR ------------------------------------------------------------------------------------------
When I try to run ReorderSam in Picard (using the reference I downloaded from ftp://ftp.broadinstitute.org/bundle/2.5/hg19/), I get the following error message:
ERROR 2013-11-22 11:34:20 ReorderSam No reference sequence dictionary found. Aborting. You can create a sequence dictionary for the reference fasta using CreateSequenceDictionary.jar.
I have no problem creating the sequence dictionary with CreateSequenceDictionary.jar, but I can't get ReorderSam to recognize the sequence dictionary. I have tried "samtools reheader" on the .bam file (using the @SQ values from CreateSequenceDictionary.jar), but I still get the same error message.
Solution: The Sequence Dictionary needs to have a .dict extension. In fact, this file is available for download from ftp://ftp.broadinstitute.org/bundle/2.5/hg19/
This solves the problem for this specific error message, but the final solution will require extra work. For example, once the sequence dictionary is recognized, I see a new error message:
Exception in thread "main" net.sf.picard.PicardException: Discordant contig lengths: read chrM LN=16569, ref chrM LN=16571
So, I might have to find the reference used by TMAP (the .bam file is automatically produced) and sort it myself. Nevertheless, I don't typically have to use .dict files, so I thought it might be helpful to share this with others.
Unfortunately, I figured the out the solution that I posted through trail and error. I don't actually work on the Picard development team. I would recommend seeing if one of them can help you.
Perhaps this can help get you started in the right direction:
http://broadinstitute.github.io/picard/ (says to use samtools mailing list for questions)
Also, here are some possibly relevant discussion groups: