Entering edit mode
6.9 years ago
pinn
▴
210
Hi, I try to call variants from the bam I did this way at the end I generated a mem_UG.vcf (481mb). I'm not able to understand at the end of the execution process. Whether it generated correct output ?
**CMD**
java -jar GenomeAnalysisTK.jar -T UnifiedGenotyper -R hg38.fa -I /home/likithreddy/Documents/Cancergenomics/ReadgroupsSRR098401mem_pesort.bam -o mem_UG.vcf -glm SNP
likithreddy@likith:~/Downloads/GATK$ java -jar GenomeAnalysisTK.jar -T UnifiedGenotyper -R hg38.fa -I /home/likithreddy/Documents/Cancergenomics/ReadgroupsSRR098401mem_pesort.bam -o mem_UG.vcf -glm SNP
INFO 17:13:17,759 HelpFormatter - ----------------------------------------------------------------------------------
INFO 17:13:17,793 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.8-0-ge9d806836, Compiled 2017/07/28 21:26:50
INFO 17:13:17,793 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute
INFO 17:13:17,794 HelpFormatter - For support and documentation go to https://software.broadinstitute.org/gatk
INFO 17:13:17,794 HelpFormatter - [Fri Jan 05 17:13:17 IST 2018] Executing on Linux 4.8.0-36-generic amd64
INFO 17:13:17,794 HelpFormatter - Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12
INFO 17:13:17,798 HelpFormatter - Program Args: -T UnifiedGenotyper -R hg38.fa -I /home/likithreddy/Documents/Cancergenomics/ReadgroupsSRR098401mem_pesort.bam -o mem_UG.vcf -glm SNP
INFO 17:13:17,838 HelpFormatter - Executing as likithreddy@likith on Linux 4.8.0-36-generic amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12.
INFO 17:13:17,839 HelpFormatter - Date/Time: 2018/01/05 17:13:17
INFO 17:13:17,839 HelpFormatter - ----------------------------------------------------------------------------------
INFO 17:13:17,839 HelpFormatter - ----------------------------------------------------------------------------------
ERROR StatusLogger Unable to create class org.apache.logging.log4j.core.impl.Log4jContextFactory specified in jar:file:/home/likithreddy/Downloads/GATK/GenomeAnalysisTK.jar!/META-INF/log4j-provider.properties
ERROR StatusLogger Log4j2 could not find a logging implementation. Please add log4j-core to the classpath. Using SimpleLogger to log to the console...
INFO 17:13:18,129 GenomeAnalysisEngine - Deflater: JdkDeflater
INFO 17:13:18,129 GenomeAnalysisEngine - Inflater: JdkInflater
INFO 17:13:18,130 GenomeAnalysisEngine - Strictness is SILENT
INFO 17:13:18,426 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 250
INFO 17:13:18,434 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
INFO 17:13:18,519 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.08
INFO 17:13:18,843 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files
INFO 17:13:19,168 GenomeAnalysisEngine - Done preparing for traversal
INFO 17:13:19,168 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO 17:13:19,169 ProgressMeter - | processed | time | per 1M | | total | remaining
INFO 17:13:19,169 ProgressMeter - Location | sites | elapsed | sites | completed | runtime | runtime
INFO 17:13:19,247 StrandBiasTest - SAM/BAM data was found. Attempting to use read data to calculate strand bias annotations values.
WARN 17:13:19,247 InbreedingCoeff - Annotation will not be calculated. InbreedingCoeff requires at least 10 unrelated samples.
INFO 17:13:19,248 StrandBiasTest - SAM/BAM data was found. Attempting to use read data to calculate strand bias annotations values.
INFO 17:13:49,172 ProgressMeter - chr1:5268065 5259264.0 30.0 s 5.0 s 0.2% 5.1 h 5.1 h
INFO 17:14:19,173 ProgressMeter - chr1:10664901 1.06496E7 60.0 s 5.0 s 0.3% 5.0 h 5.0 h
INFO 17:14:49,174 ProgressMeter - chr1:15275489 1.5269888E7 90.0 s 5.0 s 0.5% 5.3 h
5.2 h
**At End** (END Message)
INFO 21:14:29,686 ProgressMeter - chrY_KI270740v1_random:37201 3.209248865E9 4.0 h 4.0 s 100.0% 4.0 h 0.0 s
INFO 21:14:29,686 ProgressMeter - done 3.209286105E9 4.0 h 4.0 s 100.0% 4.0 h 0.0 s
INFO 21:14:29,687 ProgressMeter - Total runtime 14470.52 secs, 241.18 min, 4.02 hours
INFO 21:14:29,687 MicroScheduler - 1053193 reads were filtered out during the traversal out of approximately 185715730 total reads (0.57%)
INFO 21:14:29,687 MicroScheduler - -> 0 reads (0.00% of total) failing BadCigarFilter
INFO 21:14:29,688 MicroScheduler - -> 836765 reads (0.45% of total) failing BadMateFilter
INFO 21:14:29,688 MicroScheduler - -> 0 reads (0.00% of total) failing DuplicateReadFilter
INFO 21:14:29,688 MicroScheduler - -> 0 reads (0.00% of total) failing FailsVendorQualityCheckFilter
INFO 21:14:29,688 MicroScheduler - -> 0 reads (0.00% of total) failing MalformedReadFilter
INFO 21:14:29,688 MicroScheduler - -> 0 reads (0.00% of total) failing MappingQualityUnavailableFilter
INFO 21:14:29,688 MicroScheduler - -> 0 reads (0.00% of total) failing NotPrimaryAlignmentFilter
INFO 21:14:29,688 MicroScheduler - -> 216428 reads (0.12% of total) failing UnmappedReadFilter
------------------------------------------------------------------------------------------
Done. There were 1 WARN messages, the first 1 are repeated below.
WARN 17:13:19,247 InbreedingCoeff - Annotation will not be calculated. InbreedingCoeff requires at least 10 unrelated samples.
May I ask why you did not use GATK haplotypecaller?
I see only one error in logging
ERROR StatusLogger Unable to create class org.apache.logging.log4j.core.impl.Log4jContextFactory
and the rest looks normal to me
Hi, can you check out this HaplotypeCaller it generated a 308 Mb VCF file. Is this the correct way to do it. I ran this on my workstation with 16GB RAM. The elapsed time 8:45 hr. Can you give brief explanation HC vs UG tools for better understanding ?
CMD likith@likith-VPCEG2AEN:~/Downloads/GenomeAnalysisTK-3.8-0-ge9d806836$ java -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R hg38.fa -I /media/likith/REDDY/MEM/ReadgroupsSRR098401mem_pesort.bam -o mem.vcf INFO 14:43:26,496 HelpFormatter - ---------------------------------------------------------------------------------- INFO 14:43:26,499 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.8-0-ge9d806836, Compiled 2017/07/28 21:26:50 INFO 14:43:26,499 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute INFO 14:43:26,499 HelpFormatter - For support and documentation go to https://software.broadinstitute.org/gatk INFO 14:43:26,499 HelpFormatter - [Fri Jan 05 14:43:26 IST 2018] Executing on Linux 4.10.0-42-generic amd64 INFO 14:43:26,499 HelpFormatter - OpenJDK 64-Bit Server VM 1.8.0_151-8u151-b12-0ubuntu0.16.04.2-b12 INFO 14:43:26,503 HelpFormatter - Program Args: -T HaplotypeCaller -R hg38.fa -I /media/likith/REDDY/MEM/ReadgroupsSRR098401mem_pesort.bam -o mem.vcf INFO 14:43:26,506 HelpFormatter - Executing as likith@likith-VPCEG2AEN on Linux 4.10.0-42-generic amd64; OpenJDK 64-Bit Server VM 1.8.0_151-8u151-b12-0ubuntu0.16.04.2-b12. INFO 14:43:26,506 HelpFormatter - Date/Time: 2018/01/05 14:43:26 INFO 14:43:26,506 HelpFormatter - ---------------------------------------------------------------------------------- INFO 14:43:26,506 HelpFormatter - ---------------------------------------------------------------------------------- ERROR StatusLogger Unable to create class org.apache.logging.log4j.core.impl.Log4jContextFactory specified in jar:file:/home/likith/Downloads/GenomeAnalysisTK-3.8-0-ge9d806836/GenomeAnalysisTK.jar!/META-INF/log4j-provider.properties ERROR StatusLogger Log4j2 could not find a logging implementation. Please add log4j-core to the classpath. Using SimpleLogger to log to the console... INFO 14:43:26,637 GenomeAnalysisEngine - Deflater: IntelDeflater INFO 14:43:26,637 GenomeAnalysisEngine - Inflater: IntelInflater INFO 14:43:26,638 GenomeAnalysisEngine - Strictness is SILENT INFO 14:43:26,942 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 500 INFO 14:43:26,948 SAMDataSource$SAMReaders - Initializing SAMRecords in serial INFO 14:43:27,031 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.08 INFO 14:43:27,296 HCMappingQualityFilter - Filtering out reads with MAPQ < 20 INFO 14:43:27,414 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files INFO 14:43:27,720 GenomeAnalysisEngine - Done preparing for traversal INFO 14:43:27,721 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] INFO 14:43:27,721 ProgressMeter - | processed | time | per 1M | | total | remaining INFO 14:43:27,721 ProgressMeter - Location | active regions | elapsed | active regions | completed | runtime | runtime INFO 14:43:27,722 HaplotypeCaller - Disabling physical phasing, which is supported only for reference-model confidence output INFO 14:43:27,848 StrandBiasTest - SAM/BAM data was found. Attempting to use read data to calculate strand bias annotations values. WARN 14:43:27,848 InbreedingCoeff - Annotation will not be calculated. InbreedingCoeff requires at least 10 unrelated samples. INFO 14:43:27,849 StrandBiasTest - SAM/BAM data was found. Attempting to use read data to calculate strand bias annotations values. INFO 14:43:27,983 HaplotypeCaller - Using global mismapping rate of 45 => -4.5 in log10 likelihood units INFO 14:43:34,599 VectorLoglessPairHMM - Using OpenMP multi-threaded AVX-accelerated native PairHMM implementation [INFO] Available threads: 4 [INFO] Requested threads: 1 [INFO] Using 1 threads WARN 14:43:34,677 HaplotypeScore - Annotation will not be calculated, must be called from UnifiedGenotyper, not HaplotypeCaller INFO 14:43:57,724 ProgressMeter - chr1:2388093 0.0 30.0 s 49.6 w 0.1% 11.2 h 11.2 h INFO 14:44:27,725 ProgressMeter - chr1:5671939 0.0 60.0 s 99.2 w 0.2% 9.4 h 9.4 h INFO 14:44:57,726 ProgressMeter - chr1:9313903 0.0 90.0 s 148.8 w 0.3% 8.6 h 8.6 h INFO 14:45:27,727 ProgressMeter - chr1:12327961 0.0 120.0 s 198.4 w 0.4% 8.7 h 8.6 h INFO 14:46:07,728 ProgressMeter - chr1:15564203 0.0 2.7 m 264.6 w 0.5% 9.2 h 9.1 h
*Last Few Lines*
INFO 22:26:49,098 ProgressMeter - chrY:57093381 3.15202145E9 7.7 h 8.0 s 100.0% 7.7 h 1.0 s INFO 22:26:49,178 VectorLoglessPairHMM - Time spent in setup for JNI call : 8.726722330000001 INFO 22:26:49,180 PairHMM - Total compute time in PairHMM computeLikelihoods() : 2404.092605758 INFO 22:26:49,181 HaplotypeCaller - Ran local assembly on 1433982 active regions INFO 22:26:49,721 ProgressMeter - done 3.209286105E9 7.7 h 8.0 s 100.0% 7.7 h 0.0 s INFO 22:26:49,722 ProgressMeter - Total runtime 27802.00 secs, 463.37 min, 7.72 hours INFO 22:26:49,723 MicroScheduler - 17692810 reads were filtered out during the traversal out of approximately 184916695 total reads (9.57%) INFO 22:26:49,723 MicroScheduler - -> 0 reads (0.00% of total) failing BadCigarFilter INFO 22:26:49,724 MicroScheduler - -> 0 reads (0.00% of total) failing DuplicateReadFilter INFO 22:26:49,724 MicroScheduler - -> 0 reads (0.00% of total) failing FailsVendorQualityCheckFilter INFO 22:26:49,725 MicroScheduler - -> 17692810 reads (9.57% of total) failing HCMappingQualityFilter INFO 22:26:49,725 MicroScheduler - -> 0 reads (0.00% of total) failing MalformedReadFilter INFO 22:26:49,726 MicroScheduler - -> 0 reads (0.00% of total) failing MappingQualityUnavailableFilter INFO 22:26:49,726 MicroScheduler - -> 0 reads (0.00% of total) failing NotPrimaryAlignmentFilter
INFO 22:26:49,727 MicroScheduler - -> 0 reads (0.00% of total) failing UnmappedReadFilter
Done. There were 4 WARN messages, the first 4 are repeated below. WARN 14:43:27,848 InbreedingCoeff - Annotation will not be calculated. InbreedingCoeff requires at least 10 unrelated samples. WARN 14:43:34,677 HaplotypeScore - Annotation will not be calculated, must be called from UnifiedGenotyper, not HaplotypeCaller WARN 16:55:01,654 HaplotypeCallerGenotypingEngine - location chr14_GL000225v1_random:67535: too many alternative alleles found (9) larger than the maximum requested with -maxAltAlleles (6), the following will be dropped: GGTGATGCAACTCTTGCCTAGGCTTTGCCTACAGGGTACATCGTGACATATCGCTTCAATGATCACCCAT, GGTGATGCAACTCTTGCCTAGGCTTTGCCTACAGGGTACATTGTGACATATCGCTTCAATGATCACCCAT, GGTGATGCAACTCTTGCCTAGGCTTTGCCTACAGGGGACATCGTGACATATCGCTTCAATGATCACCCAT.
WARN 19:10:59,577 HaplotypeCallerGenotypingEngine - location chr22:43972848-43972851: too many alternative alleles found (7) larger than the maximum requested with -maxAltAlleles (6), the following will be dropped: CTTT.
First:
Regarding the error please follow this thread:
https://gatkforums.broadinstitute.org/gatk/discussion/10004/realignertargetcreator-hangs
which suggest some error in the build.
why Haplotypecaller?!
more
I'm not going to add formatting to your post again. I told you in another comment how to do that. You should put some more effort in this yourself.
I added markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below: