Question

Picard MarkDuplicates stuck at Traversing read pair information and detecting duplicates

3

Entering edit mode

8.2 years ago

14134125465346445 ★ 3.6k

Hi,

I am running Picard MarkDuplicates on a mid-size bam of about 5GB. The process starts as expected, but after about half an hour, it gets stuck at 'Traversing read pair information and detecting duplicates.'.

I left the process to run for about 16 hours, and it didn't complete. I rekicked it, and now it's stuck again. See below.

Any ideas?

++ java -Xmx11984m -jar picard.jar MarkDuplicates I=/home/dnanexus/in/sorted_bam/CEG53-67-1a_S1_L00.bam O=./CEG53-67-1a_S1_L00.deduplicated.bam M=./CEG53-67-1a_S1_L00.duplication_metrics CREATE_INDEX=true VALIDATION_STRINGENCY=SILENT REMOVE_DUPLICATES=true
[Wed Feb 01 06:34:45 UTC 2017] picard.sam.markduplicates.MarkDuplicates INPUT=[/home/dnanexus/in/sorted_bam/CEG53-67-1a_S1_L00.bam] OUTPUT=./CEG53-67-1a_S1_L00.deduplicated.bam METRICS_FILE=./CEG53-67-1a_S1_L00.duplication_metrics REMOVE_DUPLICATES=true VALIDATION_STRINGENCY=SILENT CREATE_INDEX=true    MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates ASSUME_SORTED=false DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES READ_NAME_REGEX=[a-zA-Z0-9]+:[0-9]:([0-9]+):([0-9]+):([0-9]+).* OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json
[Wed Feb 01 06:34:45 UTC 2017] Executing as root@job-F28gykj0v0qxqBffJjpxbqGq.dnanex.us on Linux 3.2.0-120-virtual amd64; OpenJDK 64-Bit Server VM 1.7.0_121-b00; Picard version: 1.131(cd60f90fdca902499c70a4472b6162ef37f919ce_1431022382) IntelDeflater
INFO    2017-02-01 06:34:45 MarkDuplicates  Start of doWork freeMemory: 231924544; totalMemory: 235405312; maxMemory: 11169955840
INFO    2017-02-01 06:34:45 MarkDuplicates  Reading input file and constructing read end information.
INFO    2017-02-01 06:34:45 MarkDuplicates  Will retain up to 42961368 data points before spilling to disk.
INFO    2017-02-01 06:34:55 MarkDuplicates  Read     1,000,000 records.  Elapsed time: 00:00:09s.  Time for last 1,000,000:    9s.  Last read position: chr1:13,362,028
[...]
INFO    2017-02-01 06:57:45 MarkDuplicates  Tracking 43467 as yet unmatched pairs. 6393 records in RAM.
INFO    2017-02-01 06:57:54 MarkDuplicates  Read   167,000,000 records.  Elapsed time: 00:23:08s.  Time for last 1,000,000:    8s.  Last read position: chrUn_GL000224v1:57,921
INFO    2017-02-01 06:57:54 MarkDuplicates  Tracking 9521 as yet unmatched pairs. 78 records in RAM.
INFO    2017-02-01 06:57:56 MarkDuplicates  Read 167152884 records. 0 pairs never matched.
INFO    2017-02-01 06:59:05 MarkDuplicates  After buildSortedReadEndLists freeMemory: 9921432520; totalMemory: 9981919232; maxMemory: 11169955840Feb 1, 2017 6:59 AM
INFO    2017-02-01 06:59:05 MarkDuplicates  Will retain up to 349061120 duplicate indices before spilling to disk.
INFO    2017-02-01 06:59:06 MarkDuplicates  Traversing read pair information and detecting duplicates.

picard markduplicates bam • 3.0k views

ADD COMMENT • link 8.2 years ago by 14134125465346445 ★ 3.6k

0

Entering edit mode

Is there is a possibility to increase the memory like -Xmx20g

also add tmp directory

mkdir mytmp  
TMP_DIR=`pwd`/mytmp

and generally in java you can use

-Djava.io.tmpdir=`pwd`/mytmp

ADD REPLY • link 8.2 years ago by Medhat 9.8k