Forum:How To Provide Reference Sequence Dictionary To Reordersam?
1
3
Entering edit mode
11.0 years ago

Problem: I am trying to use MuTect to analyze some exome capture data for paired tumor and normal samples. I am working with Ion Torrent Proton data.

Basically, MuTect requires .bam files to be ordered in karyotypic order (just like GATK).

If I try to run MuTect on the .bam files produced by TMAP, I get the following error message:

##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 2.2-25-g2a68eab):
##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
##### ERROR Please do not post this error to the GATK forum
##### ERROR
##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
##### ERROR Visit our website and forum for extensive documentation and answers to
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: Lexicographically sorted human genome sequence detected in reference.
##### ERROR For safety's sake the GATK requires human contigs in karyotypic order: 1, 2, ..., 10, 11, ..., 20, 21, 22, X, Y with M either leading or trailing these contigs.
 ##### ERROR This is because all distributed GATK resources are sorted in karyotypic order, and your processing will fail when you need to use these files.
 ##### ERROR You can use the ReorderSam utility to fix this problem: http://www.broadinstitute.org/gsa/wiki/index.php/ReorderSam
 ##### ERROR   reference contigs = [chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr1, chr20, chr21, chr22, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chrM, chrX, chrY]
 ##### ERROR ------------------------------------------------------------------------------------------

When I try to run ReorderSam in Picard (using the reference I downloaded from ftp://ftp.broadinstitute.org/bundle/2.5/hg19/), I get the following error message:

ERROR 2013-11-22 11:34:20 ReorderSam No reference sequence dictionary found. Aborting.  You can create a sequence dictionary for the reference fasta using CreateSequenceDictionary.jar.

I have no problem creating the sequence dictionary with CreateSequenceDictionary.jar, but I can't get ReorderSam to recognize the sequence dictionary. I have tried "samtools reheader" on the .bam file (using the @SQ values from CreateSequenceDictionary.jar), but I still get the same error message.

Solution: The Sequence Dictionary needs to have a .dict extension. In fact, this file is available for download from ftp://ftp.broadinstitute.org/bundle/2.5/hg19/

This solves the problem for this specific error message, but the final solution will require extra work. For example, once the sequence dictionary is recognized, I see a new error message:

Exception in thread "main" net.sf.picard.PicardException: Discordant contig lengths: read chrM LN=16569, ref chrM LN=16571

So, I might have to find the reference used by TMAP (the .bam file is automatically produced) and sort it myself. Nevertheless, I don't typically have to use .dict files, so I thought it might be helpful to share this with others.

picard gatk • 10k views
ADD COMMENT
0
Entering edit mode
10.1 years ago
Chirag Nepal ★ 2.4k

Thanks Charles for sharing ! Now, i am running into the similar problem.

It says, failed to load reference dictionary, seems like it is not taking .fa file. I tried to download the file from the link u provided, but the link does not work anymore. this is how the erros looks. Any help will be appreciated.

My command:

java -Xmx2g -jar ~/unixTools/muTect-1.1.4-bin/muTect-1.1.4.jar --analysis_type MuTect --reference_sequence /steno-internal/chirag/data/ucsc/goldenPath/hg19/assembly.fa --cosmic ~/unixTools/muTect-1.1.4-bin/inputFiles/b37_cosmic_v54_120711.vcf --dbsnp ~/unixTools/muTect-1.1.4-bin/inputFiles/00-All.vcf --intervals 17:1577100-7577200 --input_file:normal /NextGenSeqData/project-data/chirag/rawReadsMapped/chan_et_al_2013_liverFluke/bam/P10N_3001N_merged_sort.bam --input_file:tumor /NextGenSeqData/project-data/chirag/rawReadsMapped/chan_et_al_2013_liverFluke/bam/P10T_3001T_merged_sort.bam --out /NextGenSeqData/project-data/chirag/projects/exomeLiver/testOut --vcf /NextGenSeqData/project-data/chirag/projects/exomeLiver/testname.SNP.vcf

INFO  15:54:15,264 ArgumentTypeDescriptor - Dynamically determined type of /home/chirag/unixTools/muTect-1.1.4-bin/inputFiles/00-All.vcf to be VCF
INFO  15:54:15,285 ArgumentTypeDescriptor - Dynamically determined type of /home/chirag/unixTools/muTect-1.1.4-bin/inputFiles/b37_cosmic_v54_120711.vcf to be VCF
INFO  15:54:15,301 GenomeAnalysisEngine - Strictness is SILENT
WARN  15:54:18,323 RestStorageService - Error Response: PUT '/GATK_Run_Reports/YBzRxqdRvTVRacYj2VLuLRCQyIxjGsf5.report.xml.gz' -- ResponseCode: 403, ResponseStatus: Forbidden, Request Headers: [Content-Length: 733, Content-MD5: 7NBpw4F1jkCwVeC5+8FwDg==, Content-Type: application/octet-stream, x-amz-meta-md5-hash: ecd069c381758e40b055e0b9fbc1700e, Date: Mon, 29 Sep 2014 13:54:16 GMT, Authorization: AWS AKIAJXU7VIHBPDW4TDSQ:dXqD6yDLe5iY1wNeAaL+3FkR3KY=, User-Agent: JetS3t/0.8.1 (Linux/3.2.0-4-amd64; amd64; en; JVM 1.7.0_67), Host: s3.amazonaws.com, Expect: 100-continue], Response Headers: [x-amz-request-id: 46D9A3E8A7B498D7, x-amz-id-2: 0g3fl8lY8s/xh2M443KR7UzkuDshanWb27GfNts6/hpDYnYAwQWYt0bMP6Vq7Nca, Content-Type: application/xml, Transfer-Encoding: chunked, Date: Mon, 29 Sep 2014 13:54:18 GMT, Connection: close, Server: AmazonS3]
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 2.2-25-g2a68eab):
##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
##### ERROR Please do not post this error to the GATK forum
##### ERROR
##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
##### ERROR Visit our website and forum for extensive documentation and answers to
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: Invalid command line: Failed to load reference dictionary
##### ERROR ------------------------------------------------------------------------------------------

Thanks!

ADD COMMENT
0
Entering edit mode

Unfortunately, I figured the out the solution that I posted through trail and error. I don't actually work on the Picard development team. I would recommend seeing if one of them can help you.

Perhaps this can help get you started in the right direction:

http://broadinstitute.github.io/picard/ (says to use samtools mailing list for questions)

Also, here are some possibly relevant discussion groups:

ADD REPLY

Login before adding your answer.

Traffic: 1400 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6