Entering edit mode
8.6 years ago
MAPK
★
2.1k
I am trying to run this tool called ContEst ( https://www.broadinstitute.org/cancer/cga/contest_run ) to identify sample contamination in a cohort of 200 samples. I have multigenome vcf files split per chromosome, so there are 24 files. I was not able to run the example data. I am not sure how I can input all those vcf files and bam files for my 200 samples. Also, How do I create this /hg19_population_stratified_af_hapmap_3.3.vcf
file they have provided in their test data? Can someone please advise on this?
This is the command I used for test data:
java -Xmx2400m -jar ../contest-1.0.24530-bin/ContEst.jar -T Contamination -I ../ContEst_example_data/chr20_sites.bam -R ../hg19.fa -B:pop,vcf ../hg19_population_stratified_af_hapmap_3.3.vcf -B:genotypes, ../ContEst_example_data/hg00142.vcf -BTI genotypes -o Sample1.out.txt
Generates this error:
INFO 16:34:06,880 HelpFormatter - -----------------------------------------------------------------------------------
INFO 16:34:06,882 HelpFormatter - The Genome Analysis Toolkit (GATK) v1.0-6228-gdf95121, Compiled 2011/07/14 11:09:43
INFO 16:34:06,882 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO 16:34:06,882 HelpFormatter - Please view our documentation at http://www.broadinstitute.org/gsa/wiki
INFO 16:34:06,882 HelpFormatter - For support, please view our support site at http://getsatisfaction.com/gsa
INFO 16:34:06,883 HelpFormatter - Program Args: -T Contamination -I ../ContEst_example_data/chr20_sites.bam -R ../hg19.fa -B:pop,vcf ../hg19_population_stratified_af_hapmap_3.3.vcf -B:genotypes, ../ContEst_example_data/hg00142.vcf -BTI genotypes -o Sample1.out.txt
INFO 16:34:06,883 HelpFormatter - Date/Time: 2016/04/28 16:34:06
INFO 16:34:06,883 HelpFormatter - -----------------------------------------------------------------------------------
INFO 16:34:06,883 HelpFormatter - -----------------------------------------------------------------------------------
INFO 16:34:06,889 GenomeAnalysisEngine - Strictness is SILENT
WARN 16:34:08,503 RestStorageService - Error Response: PUT '/GATK_Run_Reports/17720jPUXZENGmL5xxhVm8ktG7Wauzrh.report.xml.gz' -- ResponseCode: 403, ResponseStatus: Forbidden, Request Headers: [Content-Length: 1546, Content-MD5: fwxUz6XYZcATFzjt7dnhiw==, Content-Type: application/octet-stream, x-amz-meta-md5-hash: 7f0c54cfa5d865c0131738ededd9e18b, Date: Thu, 28 Apr 2016 06:34:07 GMT, Authorization: AWS AKIAJXU7VIHBPDW4TDSQ:+lhsRlwcDuhvJHiD8t0aUXAJ1GY=, User-Agent: JetS3t/0.8.0 (Linux/3.13.0-71-generic; amd64; en; JVM 1.7.0_76), Host: s3.amazonaws.com, Expect: 100-continue], Response Headers: [x-amz-request-id: 92AAE1F75FB98501, x-amz-id-2: obrad2EGQHlvJ6kwawpw0l9hJWToGIPxeUuWA3PXAi3T+i3+mdcFiMae6z0F0oqC, Content-Type: application/xml, Transfer-Encoding: chunked, Date: Thu, 28 Apr 2016 06:34:07 GMT, Connection: close, Server: AmazonS3]
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 1.0-6228-gdf95121):
##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
##### ERROR Please do not post this error to the GATK forum
##### ERROR
##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
##### ERROR Visit our wiki for extensive documentation http://www.broadinstitute.org/gsa/wiki
##### ERROR Visit our forum to view answers to commonly asked questions http://getsatisfaction.com/gsa
##### ERROR
##### ERROR MESSAGE: Invalid command line: Failed to load reference dictionary
##### ERROR ------------------------------------------------------------------------------------------
So the
MESSAGE
is showing there is no sequence dictionary available for your reference genome. Create on with picard tools CreateSequenceDictionaryThanks, but that is where I have error while creating .dict file:
[Fri Apr 29 ] picard.sam.CreateSequenceDictionary REFERENCE=hg19.fa OUTPUT=hg19.dict TRUNCATE_NAMES_AT_WHITESPACE=true NUM_SEQUENCES=2147483647 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false OpenJDK 64-Bit Server VM warning: You have loaded library disabled stack guard. The VM will try to fix the stack guard now.