Entering edit mode
8.9 years ago
zwang10
▴
30
I am new to bam file and GATK tools. I want to convert bam into vcf by running
java -jar /media/zwang10/Elements/UK10K/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar \
-R /media/zwang10/Elements/UK10K/human_g1k_v37.fasta \
-T HaplotypeCaller \
-I _EGAR00001038931_36843.pe.raw.sorted.bam \
--genotyping_mode DISCOVERY \
-stand_emit_conf 10 \
-stand_call_conf 30 \
-o raw_variants.vcf
But I got
INFO 19:11:19,792 HelpFormatter - --------------------------------------------------------------------------------
INFO 19:11:19,798 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.5-0-g36282e4, Compiled 2015/11/25 04:03:56
INFO 19:11:19,798 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO 19:11:19,799 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
INFO 19:11:19,807 HelpFormatter - Program Args: -R /media/zwang10/Elements/UK10K/human_g1k_v37.fasta -T HaplotypeCaller -I _EGAR00001038931_36843.pe.raw.sorted.bam --genotyping_mode DISCOVERY -stand_emit_conf 10 -stand_call_conf 30 -o raw_variants.vcf
INFO 19:11:19,820 HelpFormatter - Executing as zwang10@zwang10-K55N on Linux 3.13.0-74-generic amd64; OpenJDK 64-Bit Server VM 1.7.0_91-b02.
INFO 19:11:19,821 HelpFormatter - Date/Time: 2016/01/03 19:11:19
INFO 19:11:19,822 HelpFormatter - --------------------------------------------------------------------------------
INFO 19:11:19,823 HelpFormatter - --------------------------------------------------------------------------------
INFO 19:11:20,220 GenomeAnalysisEngine - Strictness is SILENT
INFO 19:11:20,537 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 500
INFO 19:11:20,554 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
INFO 19:11:20,783 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.23
INFO 19:11:20,887 HCMappingQualityFilter - Filtering out reads with MAPQ < 20
INFO 19:11:21,120 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files
INFO 19:11:22,436 GenomeAnalysisEngine - Done preparing for traversal
INFO 19:11:22,437 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO 19:11:22,439 ProgressMeter - | processed | time | per 1M | | total | remaining
INFO 19:11:22,440 ProgressMeter - Location | active regions | elapsed | active regions | completed | runtime | runtime
INFO 19:11:22,441 HaplotypeCaller - Disabling physical phasing, which is supported only for reference-model confidence output
INFO 19:11:22,562 StrandBiasTest - SAM/BAM data was found. Attempting to use read data to calculate strand bias annotations values.
WARN 19:11:22,563 InbreedingCoeff - Annotation will not be calculated. InbreedingCoeff requires at least 10 unrelated samples.
INFO 19:11:22,565 StrandBiasTest - SAM/BAM data was found. Attempting to use read data to calculate strand bias annotations values.
INFO 19:11:22,930 HaplotypeCaller - Using global mismapping rate of 45 => -4.5 in log10 likelihood units
INFO 19:11:27,999 GATKRunReport - Uploaded run statistics report to AWS S3
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 3.5-0-g36282e4):
##### ERROR
##### ERROR This means that one or more arguments or inputs in your command are incorrect.
##### ERROR The error message below tells you what is the problem.
##### ERROR
##### ERROR If the problem is an invalid argument, please check the online documentation guide
##### ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
##### ERROR
##### ERROR Visit our website and forum for extensive documentation and answers to
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
##### ERROR
##### ERROR MESSAGE: SAM/BAM/CRAM file htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter@51762faf is malformed. Please see http://gatkforums.broadinstitute.org/discussion/1317/collected-faqs-about-input-files-for-sequence-read-data-bam-cramfor more information. Error details: Read FCC03A6ABXX:3:2107:11142:198335#TAGCTTAT is missing the read group (RG) tag, which is required by the GATK. Please see http://gatkforums.broadinstitute.org/discussion/59/companion-utilities-replacereadgroups to fix this problem
##### ERROR ------------------------------------------------------------------------------------------
zwang10@zwang10-K55N:/media/zwang10/Elements/UK10K$ java -jar /media/zwang10/Elements/UK10K/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -R /media/zwang10/Elements/UK10K/human_g1k_v37.fasta -T HaplotypeCaller -I _EGAR00001038931_36843.pe.raw.sorted.bam --genotyping_mode DISCOVERY -stand_emit_conf 10 -stand_call_conf 30 -o raw_variants.vcf^C
zwang10@zwang10-K55N:/media/zwang10/Elements/UK10K$ java -jar /media/zwang10/Elements/UK10K/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -R /media/zwang10/Elements/UK10K/human_g1k_v37.fasta -T HaplotypeCaller -I _EGAR00001038931_36843.pe.raw.sorted.bam --genotyping_mode DISCOVERY -stand_emit_conf 10 -stand_call_conf 30 -o raw_variants.vcf > error
INFO 19:14:00,776 HelpFormatter - --------------------------------------------------------------------------------
INFO 19:14:00,783 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.5-0-g36282e4, Compiled 2015/11/25 04:03:56
INFO 19:14:00,784 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO 19:14:00,785 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
INFO 19:14:00,793 HelpFormatter - Program Args: -R /media/zwang10/Elements/UK10K/human_g1k_v37.fasta -T HaplotypeCaller -I _EGAR00001038931_36843.pe.raw.sorted.bam --genotyping_mode DISCOVERY -stand_emit_conf 10 -stand_call_conf 30 -o raw_variants.vcf
INFO 19:14:00,806 HelpFormatter - Executing as zwang10@zwang10-K55N on Linux 3.13.0-74-generic amd64; OpenJDK 64-Bit Server VM 1.7.0_91-b02.
INFO 19:14:00,807 HelpFormatter - Date/Time: 2016/01/03 19:14:00
INFO 19:14:00,808 HelpFormatter - --------------------------------------------------------------------------------
INFO 19:14:00,808 HelpFormatter - --------------------------------------------------------------------------------
INFO 19:14:01,199 GenomeAnalysisEngine - Strictness is SILENT
INFO 19:14:01,500 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 500
INFO 19:14:01,517 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
INFO 19:14:01,668 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.15
INFO 19:14:01,739 HCMappingQualityFilter - Filtering out reads with MAPQ < 20
INFO 19:14:01,982 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files
INFO 19:14:03,265 GenomeAnalysisEngine - Done preparing for traversal
INFO 19:14:03,266 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO 19:14:03,268 ProgressMeter - | processed | time | per 1M | | total | remaining
INFO 19:14:03,269 ProgressMeter - Location | active regions | elapsed | active regions | completed | runtime | runtime
INFO 19:14:03,270 HaplotypeCaller - Disabling physical phasing, which is supported only for reference-model confidence output
INFO 19:14:03,390 StrandBiasTest - SAM/BAM data was found. Attempting to use read data to calculate strand bias annotations values.
WARN 19:14:03,391 InbreedingCoeff - Annotation will not be calculated. InbreedingCoeff requires at least 10 unrelated samples.
INFO 19:14:03,393 StrandBiasTest - SAM/BAM data was found. Attempting to use read data to calculate strand bias annotations values.
INFO 19:14:03,675 HaplotypeCaller - Using global mismapping rate of 45 => -4.5 in log10 likelihood units
INFO 19:14:08,680 GATKRunReport - Uploaded run statistics report to AWS S3
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 3.5-0-g36282e4):
##### ERROR
##### ERROR This means that one or more arguments or inputs in your command are incorrect.
##### ERROR The error message below tells you what is the problem.
##### ERROR
##### ERROR If the problem is an invalid argument, please check the online documentation guide
##### ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
##### ERROR
##### ERROR Visit our website and forum for extensive documentation and answers to
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
##### ERROR
##### ERROR MESSAGE: SAM/BAM/CRAM file htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter@5c0bb1d5 is malformed. Please see http://gatkforums.broadinstitute.org/discussion/1317/collected-faqs-about-input-files-for-sequence-read-data-bam-cramfor more information. Error details: Read FCC03A6ABXX:3:2107:11142:198335#TAGCTTAT is missing the read group (RG) tag, which is required by the GATK. Please see http://gatkforums.broadinstitute.org/discussion/59/companion-utilities-replacereadgroups to fix this problem
##### ERROR ------------------------------------------------------------------------------------------
Is there a way to add the missing RG tag?
As it is said: