Entering edit mode
7.9 years ago
14134125465346445
★
3.6k
Hi,
I am running Picard MarkDuplicates on a mid-size bam of about 5GB. The process starts as expected, but after about half an hour, it gets stuck at 'Traversing read pair information and detecting duplicates.'.
I left the process to run for about 16 hours, and it didn't complete. I rekicked it, and now it's stuck again. See below.
Any ideas?
++ java -Xmx11984m -jar picard.jar MarkDuplicates I=/home/dnanexus/in/sorted_bam/CEG53-67-1a_S1_L00.bam O=./CEG53-67-1a_S1_L00.deduplicated.bam M=./CEG53-67-1a_S1_L00.duplication_metrics CREATE_INDEX=true VALIDATION_STRINGENCY=SILENT REMOVE_DUPLICATES=true
[Wed Feb 01 06:34:45 UTC 2017] picard.sam.markduplicates.MarkDuplicates INPUT=[/home/dnanexus/in/sorted_bam/CEG53-67-1a_S1_L00.bam] OUTPUT=./CEG53-67-1a_S1_L00.deduplicated.bam METRICS_FILE=./CEG53-67-1a_S1_L00.duplication_metrics REMOVE_DUPLICATES=true VALIDATION_STRINGENCY=SILENT CREATE_INDEX=true MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates ASSUME_SORTED=false DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES READ_NAME_REGEX=[a-zA-Z0-9]+:[0-9]:([0-9]+):([0-9]+):([0-9]+).* OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json
[Wed Feb 01 06:34:45 UTC 2017] Executing as root@job-F28gykj0v0qxqBffJjpxbqGq.dnanex.us on Linux 3.2.0-120-virtual amd64; OpenJDK 64-Bit Server VM 1.7.0_121-b00; Picard version: 1.131(cd60f90fdca902499c70a4472b6162ef37f919ce_1431022382) IntelDeflater
INFO 2017-02-01 06:34:45 MarkDuplicates Start of doWork freeMemory: 231924544; totalMemory: 235405312; maxMemory: 11169955840
INFO 2017-02-01 06:34:45 MarkDuplicates Reading input file and constructing read end information.
INFO 2017-02-01 06:34:45 MarkDuplicates Will retain up to 42961368 data points before spilling to disk.
INFO 2017-02-01 06:34:55 MarkDuplicates Read 1,000,000 records. Elapsed time: 00:00:09s. Time for last 1,000,000: 9s. Last read position: chr1:13,362,028
[...]
INFO 2017-02-01 06:57:45 MarkDuplicates Tracking 43467 as yet unmatched pairs. 6393 records in RAM.
INFO 2017-02-01 06:57:54 MarkDuplicates Read 167,000,000 records. Elapsed time: 00:23:08s. Time for last 1,000,000: 8s. Last read position: chrUn_GL000224v1:57,921
INFO 2017-02-01 06:57:54 MarkDuplicates Tracking 9521 as yet unmatched pairs. 78 records in RAM.
INFO 2017-02-01 06:57:56 MarkDuplicates Read 167152884 records. 0 pairs never matched.
INFO 2017-02-01 06:59:05 MarkDuplicates After buildSortedReadEndLists freeMemory: 9921432520; totalMemory: 9981919232; maxMemory: 11169955840Feb 1, 2017 6:59 AM
INFO 2017-02-01 06:59:05 MarkDuplicates Will retain up to 349061120 duplicate indices before spilling to disk.
INFO 2017-02-01 06:59:06 MarkDuplicates Traversing read pair information and detecting duplicates.
Is there is a possibility to increase the memory like
-Xmx20g
also add tmp directory
and generally in java you can use
-Djava.io.tmpdir=`pwd`/mytmp