Entering edit mode
16 months ago
Manuel Sokolov Ravasqueira
▴
110
I am using GATK GetPileupSummaries in the following way:
GENOME="/FILES/HUMAN_REFERENCES/hg19.fa"
RECBAM="/FILES/${patient_id}/${patient_id}.recalibrated.bam"
intervals_list="/FILES/HUMAN_REFERENCES/wgs_calling_regions.v1.interval_list"
GERM="/FILES/HUMAN_REFERENCES/small_exac_common_3-hg19.vcf"
PON="/FILES/HUMAN_REFERENCES/Mutect2-WGS-panel-
b37-hg19.vcf"export GERM="/FILES/HUMAN_REFERENCES/af-only-gnomad-hg19.raw.sites.vcf"
VCF="/FILES/${patient_id}/${patient_id}.recalibrated.vcf"
OUTPUT="/FILES/${patient_id}/${patient_id}.getpileupsummaries.table"
srun /mnt/beegfs/apptainer/images/gatk4.sif gatk GetPileupSummaries \
-I $RECBAM \
-L $GERM \
-O $OUTPUT \
-V $GERM
Resulting in the following error:
16:37:40.248 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.4.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
16:37:40.259 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.4.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
16:37:40.353 INFO GetPileupSummaries - ------------------------------------------------------------
16:37:40.360 INFO GetPileupSummaries - ------------------------------------------------------------
16:37:40.359 INFO GetPileupSummaries - The Genome Analysis Toolkit (GATK) v4.4.0.0
16:37:40.359 INFO GetPileupSummaries - For support and documentation go to https://software.broadinstitute.org/gatk/
16:37:40.360 INFO GetPileupSummaries - Executing as manuelravasqueira@compute-4.imm-lobo.fm.ul.pt on Linux v5.4.0-148-generic amd64
16:37:40.360 INFO GetPileupSummaries - Java runtime: OpenJDK 64-Bit Server VM v17.0.6+10-Ubuntu-0ubuntu118.04.1
16:37:40.360 INFO GetPileupSummaries - Start Date/Time: July 29, 2023 at 4:37:40 PM GMT
16:37:40.366 INFO GetPileupSummaries - The Genome Analysis Toolkit (GATK) v4.4.0.0
16:37:40.360 INFO GetPileupSummaries - ------------------------------------------------------------
16:37:40.366 INFO GetPileupSummaries - For support and documentation go to https://software.broadinstitute.org/gatk/
16:37:40.360 INFO GetPileupSummaries - ------------------------------------------------------------
16:37:40.366 INFO GetPileupSummaries - Executing as manuelravasqueira@compute-11.imm-lobo.fm.ul.pt on Linux v5.4.0-148-generic amd64
16:37:40.361 INFO GetPileupSummaries - HTSJDK Version: 3.0.5
16:37:40.366 INFO GetPileupSummaries - Java runtime: OpenJDK 64-Bit Server VM v17.0.6+10-Ubuntu-0ubuntu118.04.1
16:37:40.361 INFO GetPileupSummaries - Picard Version: 3.0.0
16:37:40.366 INFO GetPileupSummaries - Start Date/Time: July 29, 2023 at 4:37:40 PM GMT
16:37:40.362 INFO GetPileupSummaries - Built for Spark Version: 3.3.1
16:37:40.366 INFO GetPileupSummaries - ------------------------------------------------------------
16:37:40.362 INFO GetPileupSummaries - HTSJDK Defaults.COMPRESSION_LEVEL : 2
16:37:40.366 INFO GetPileupSummaries - ------------------------------------------------------------
16:37:40.362 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
16:37:40.367 INFO GetPileupSummaries - HTSJDK Version: 3.0.5
16:37:40.363 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
16:37:40.367 INFO GetPileupSummaries - Picard Version: 3.0.0
16:37:40.363 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
16:37:40.368 INFO GetPileupSummaries - Built for Spark Version: 3.3.1
16:37:40.363 INFO GetPileupSummaries - Deflater: IntelDeflater
16:37:40.368 INFO GetPileupSummaries - HTSJDK Defaults.COMPRESSION_LEVEL : 2
16:37:40.363 INFO GetPileupSummaries - Inflater: IntelInflater
16:37:40.368 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
16:37:40.364 INFO GetPileupSummaries - GCS max retries/reopens: 20
16:37:40.369 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
16:37:40.364 INFO GetPileupSummaries - Requester pays: disabled
16:37:40.369 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
16:37:40.364 INFO GetPileupSummaries - Initializing engine
16:37:40.369 INFO GetPileupSummaries - Deflater: IntelDeflater
16:37:40.369 INFO GetPileupSummaries - Inflater: IntelInflater
16:37:40.370 INFO GetPileupSummaries - GCS max retries/reopens: 20
16:37:40.370 INFO GetPileupSummaries - Requester pays: disabled
16:37:40.370 INFO GetPileupSummaries - Initializing engine
16:37:40.472 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.4.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
16:37:40.545 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.4.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
16:37:40.559 INFO FeatureManager - Using codec VCFCodec to read file file:///mnt/nfs/lobo/SALMEIDA-NFS/lcosta/Manuel/FILES/HUMAN_REFERENCES/af-only-gnomad-hg19.raw.sites.vcf
16:37:40.583 INFO GetPileupSummaries - ------------------------------------------------------------
16:37:40.589 INFO GetPileupSummaries - The Genome Analysis Toolkit (GATK) v4.4.0.0
16:37:40.589 INFO GetPileupSummaries - For support and documentation go to https://software.broadinstitute.org/gatk/
16:37:40.589 INFO GetPileupSummaries - Executing as manuelravasqueira@compute-10.imm-lobo.fm.ul.pt on Linux v5.4.0-148-generic amd64
16:37:40.589 INFO GetPileupSummaries - Java runtime: OpenJDK 64-Bit Server VM v17.0.6+10-Ubuntu-0ubuntu118.04.1
16:37:40.590 INFO GetPileupSummaries - Start Date/Time: July 29, 2023 at 4:37:40 PM GMT
16:37:40.590 INFO GetPileupSummaries - ------------------------------------------------------------
16:37:40.590 INFO GetPileupSummaries - ------------------------------------------------------------
16:37:40.591 INFO GetPileupSummaries - HTSJDK Version: 3.0.5
16:37:40.591 INFO GetPileupSummaries - Picard Version: 3.0.0
16:37:40.591 INFO GetPileupSummaries - Built for Spark Version: 3.3.1
16:37:40.591 INFO GetPileupSummaries - HTSJDK Defaults.COMPRESSION_LEVEL : 2
16:37:40.592 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
16:37:40.592 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
16:37:40.592 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
16:37:40.592 INFO GetPileupSummaries - Deflater: IntelDeflater
16:37:40.593 INFO GetPileupSummaries - Inflater: IntelInflater
16:37:40.593 INFO GetPileupSummaries - GCS max retries/reopens: 20
16:37:40.593 INFO GetPileupSummaries - Requester pays: disabled
16:37:40.594 INFO GetPileupSummaries - Initializing engine
16:37:40.642 INFO GetPileupSummaries - ------------------------------------------------------------
16:37:40.647 INFO GetPileupSummaries - The Genome Analysis Toolkit (GATK) v4.4.0.0
16:37:40.648 INFO GetPileupSummaries - For support and documentation go to https://software.broadinstitute.org/gatk/
16:37:40.648 INFO GetPileupSummaries - Executing as manuelravasqueira@compute-12.imm-lobo.fm.ul.pt on Linux v5.4.0-148-generic amd64
16:37:40.648 INFO GetPileupSummaries - Java runtime: OpenJDK 64-Bit Server VM v17.0.6+10-Ubuntu-0ubuntu118.04.1
16:37:40.648 INFO GetPileupSummaries - Start Date/Time: July 29, 2023 at 4:37:40 PM GMT
16:37:40.648 INFO GetPileupSummaries - ------------------------------------------------------------
16:37:40.648 INFO GetPileupSummaries - ------------------------------------------------------------
16:37:40.649 INFO GetPileupSummaries - HTSJDK Version: 3.0.5
16:37:40.649 INFO GetPileupSummaries - Picard Version: 3.0.0
16:37:40.650 INFO GetPileupSummaries - Built for Spark Version: 3.3.1
16:37:40.650 INFO GetPileupSummaries - HTSJDK Defaults.COMPRESSION_LEVEL : 2
16:37:40.650 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
16:37:40.650 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
16:37:40.651 INFO GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
16:37:40.651 INFO GetPileupSummaries - Deflater: IntelDeflater
16:37:40.651 INFO GetPileupSummaries - Inflater: IntelInflater
16:37:40.651 INFO GetPileupSummaries - GCS max retries/reopens: 20
16:37:40.651 INFO GetPileupSummaries - Requester pays: disabled
16:37:40.652 INFO GetPileupSummaries - Initializing engine
16:37:40.690 INFO FeatureManager - Using codec VCFCodec to read file file:///mnt/nfs/lobo/SALMEIDA-NFS/lcosta/Manuel/FILES/HUMAN_REFERENCES/af-only-gnomad-hg19.raw.sites.vcf
16:37:40.783 INFO FeatureManager - Using codec VCFCodec to read file file:///mnt/nfs/lobo/SALMEIDA-NFS/lcosta/Manuel/FILES/HUMAN_REFERENCES/af-only-gnomad-hg19.raw.sites.vcf
16:37:40.832 INFO FeatureManager - Using codec VCFCodec to read file file:///mnt/nfs/lobo/SALMEIDA-NFS/lcosta/Manuel/FILES/HUMAN_REFERENCES/af-only-gnomad-hg19.raw.sites.vcf
16:37:43.194 INFO FeatureManager - Using codec VCFCodec to read file file:///mnt/nfs/lobo/SALMEIDA-NFS/lcosta/Manuel/FILES/HUMAN_REFERENCES/af-only-gnomad-hg19.raw.sites.vcf
16:37:43.317 INFO FeatureManager - Using codec VCFCodec to read file file:///mnt/nfs/lobo/SALMEIDA-NFS/lcosta/Manuel/FILES/HUMAN_REFERENCES/af-only-gnomad-hg19.raw.sites.vcf
16:37:43.378 INFO FeatureManager - Using codec VCFCodec to read file file:///mnt/nfs/lobo/SALMEIDA-NFS/lcosta/Manuel/FILES/HUMAN_REFERENCES/af-only-gnomad-hg19.raw.sites.vcf
16:37:43.645 INFO FeatureManager - Using codec VCFCodec to read file file:///mnt/nfs/lobo/SALMEIDA-NFS/lcosta/Manuel/FILES/HUMAN_REFERENCES/af-only-gnomad-hg19.raw.sites.vcf
16:45:33.143 INFO IntervalArgumentCollection - Processing 331680222 bp from intervals
16:45:43.805 INFO GetPileupSummaries - Done initializing engine
16:45:43.828 INFO ProgressMeter - Starting traversal
16:45:43.828 INFO ProgressMeter - Current Locus Elapsed Minutes Loci Processed Loci/Minute
16:46:31.241 INFO IntervalArgumentCollection - Processing 331680222 bp from intervals
16:46:33.178 INFO IntervalArgumentCollection - Processing 331680222 bp from intervals
16:46:35.022 INFO IntervalArgumentCollection - Processing 331680222 bp from intervals
16:46:41.551 INFO GetPileupSummaries - Done initializing engine
16:46:41.573 INFO ProgressMeter - Starting traversal
16:46:41.574 INFO ProgressMeter - Current Locus Elapsed Minutes Loci Processed Loci/Minute
16:46:41.757 INFO GetPileupSummaries - Done initializing engine
16:46:41.784 INFO ProgressMeter - Starting traversal
16:46:41.784 INFO ProgressMeter - Current Locus Elapsed Minutes Loci Processed Loci/Minute
16:46:43.969 INFO GetPileupSummaries - Done initializing engine
16:46:43.991 INFO ProgressMeter - Starting traversal
16:46:43.992 INFO ProgressMeter - Current Locus Elapsed Minutes Loci Processed Loci/Minute
17:04:32.281 INFO GetPileupSummaries - Shutting down engine
[July 29, 2023 at 5:04:32 PM GMT] org.broadinstitute.hellbender.tools.walkers.contamination.GetPileupSummaries done. Elapsed time: 26.87 minutes.
Runtime.totalMemory()=22481469440
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.base/java.util.BitSet.initWords(BitSet.java:169)
at java.base/java.util.BitSet.<init>(BitSet.java:164)
at htsjdk.samtools.GenomicIndexUtil.regionToBins(GenomicIndexUtil.java:164)
at htsjdk.samtools.BinningIndexContent.getChunksOverlapping(BinningIndexContent.java:121)
at htsjdk.samtools.CachingBAMFileIndex.getSpanOverlapping(CachingBAMFileIndex.java:75)
at htsjdk.samtools.BAMFileReader.getFileSpan(BAMFileReader.java:930)
at htsjdk.samtools.BAMFileReader.createIndexIterator(BAMFileReader.java:947)
at htsjdk.samtools.BAMFileReader.query(BAMFileReader.java:628)
at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.query(SamReader.java:550)
at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.queryOverlapping(SamReader.java:417)
at org.broadinstitute.hellbender.utils.iterators.SamReaderQueryingIterator.loadNextIterator(SamReaderQueryingIterator.java:130)
at org.broadinstitute.hellbender.utils.iterators.SamReaderQueryingIterator.<init>(SamReaderQueryingIterator.java:69)
at org.broadinstitute.hellbender.engine.ReadsPathDataSource.prepareIteratorsForTraversal(ReadsPathDataSource.java:413)
at org.broadinstitute.hellbender.engine.ReadsPathDataSource.iterator(ReadsPathDataSource.java:336)
at java.base/java.lang.Iterable.spliterator(Iterable.java:101)
at org.broadinstitute.hellbender.utils.Utils.stream(Utils.java:1176)
at org.broadinstitute.hellbender.engine.GATKTool.getTransformedReadStream(GATKTool.java:384)
at org.broadinstitute.hellbender.engine.LocusWalker.getAlignmentContextIterator(LocusWalker.java:174)
at org.broadinstitute.hellbender.engine.LocusWalker.traverse(LocusWalker.java:149)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1098)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:149)
at
org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:198)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:217)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
Using GATK jar /gatk/gatk-package-4.4.0.0-local.jar
I added to GetPileupSummaries that option and tried with 16G and 32G, same result. Could it be related to using as GERM GERM="/FILES/HUMAN_REFERENCES/af-only-gnomad-hg19.raw.sites.vcf" ?
You might need more than 32 GB. Some tools in bioinformatics require >512 GB .... I don't use GATK, but I'd try on a bigger machine or specify more RAM.
Thank you, I am already trying with 4 Nodes each with 200GB same result... EDIT (Tried with even more computing power and it worked! Thank you!)
Please accept Pierre's answer to mark the question as solved. EDIT: I'm accepting Pierre's answer because OP has not been active in 4 months.