Picard MarkDuplicates failing without error message
1
0
Entering edit mode
3.5 years ago
Jordi ▴ 60

Hi,

I am running Picard's MarkDuplicates command on a bam file that I generated according to GATK's best practices guidelines. The command has failed now multiple times at different stages without any error messages or warning. The last few lines of the last attempt are the following:

INFO    2021-06-28 13:43:06 MarkDuplicates  Read   665,000,000 records.  Elapsed time: 02:01:54s.  Time for last 1,000,000:    5s.  Last read position: chr8:30,463,617
INFO    2021-06-28 13:43:06 MarkDuplicates  Tracking 6745737 as yet unmatched pairs. 344808 records in RAM.
INFO    2021-06-28 13:43:18 MarkDuplicates  Read   666,000,000 records.  Elapsed time: 02:02:06s.  Time for last 1,000,000:   11s.  Last read position: chr8:32,749,364
INFO    2021-06-28 13:43:18 MarkDuplicates  Tracking 6744138 as yet unmatched pairs. 339607 records in RAM.
INFO    2021-06-28 13:43:23 MarkDuplicates  Read   667,000,000 records.  Elapsed time: 02:02:11s.  Time for last 1,000,000:    5s.  Last read position: chr8:35,027,712
INFO    2021-06-28 13:43:23 MarkDuplicates  Tracking 6742550 as yet unmatched pairs. 334065 records in RAM.
INFO    2021-06-28 13:44:39 MarkDuplicates  Read   668,000,000 records.  Elapsed time: 02:03:27s.  Time for last 1,000,000:   76s.  Last read position: chr8:37,295,885
INFO    2021-06-28 13:44:39 MarkDuplicates  Tracking 6740947 as yet unmatched pairs. 328694 records in RAM.
INFO    2021-06-28 13:44:47 MarkDuplicates  Read   669,000,000 records.  Elapsed time: 02:03:35s.  Time for last 1,000,000:    7s.  Last read position: chr8:39,537,579
INFO    2021-06-28 13:44:47 MarkDuplicates  Tracking 6739367 as yet unmatched pairs. 323206 records in RAM.
INFO    2021-06-28 13:45:57 MarkDuplicates  Read   670,000,000 records.  Elapsed time: 02:04:45s.  Time for last 1,000,000:   69s.  Last read position: chr8:41,790,226
INFO    2021-06-28 13:45:57 MarkDuplicates  Tracking 6737773 as yet unmatched pairs. 317727 records in RAM.
INFO    2021-06-28 13:46:03 MarkDuplicates  Read   671,000,000 records.  Elapsed time: 02:04:51s.  Time for last 1,000,000:    5s.  Last read position: chr8:43,092,877
INFO    2021-06-28 13:46:03 MarkDuplicates  Tracking 7007382 as yet unmatched pairs. 583427 records in RAM.
INFO    2021-06-28 13:46:11 MarkDuplicates  Read   672,000,000 records.  Elapsed time: 02:04:59s.  Time for last 1,000,000:    7s.  Last read position: chr8:43,092,916
INFO    2021-06-28 13:46:11 MarkDuplicates  Tracking 7094903 as yet unmatched pairs. 668562 records in RAM.
INFO    2021-06-28 13:46:22 MarkDuplicates  Read   673,000,000 records.  Elapsed time: 02:05:10s.  Time for last 1,000,000:   11s.  Last read position: chr8:43,094,783
INFO    2021-06-28 13:46:22 MarkDuplicates  Tracking 7144551 as yet unmatched pairs. 715971 records in RAM.
INFO    2021-06-28 13:46:29 MarkDuplicates  Read   674,000,000 records.  Elapsed time: 02:05:17s.  Time for last 1,000,000:    7s.  Last read position: chr8:43,095,887
INFO    2021-06-28 13:46:29 MarkDuplicates  Tracking 6905139 as yet unmatched pairs. 474456 records in RAM.
INFO    2021-06-28 13:46:35 MarkDuplicates  Read   675,000,000 records.  Elapsed time: 02:05:23s.  Time for last 1,000,000:    6s.  Last read position: chr8:43,820,929
INFO    2021-06-28 13:46:35 MarkDuplicates  Tracking 6757332 as yet unmatched pairs. 293186 records in RAM.
INFO    2021-06-28 13:46:45 MarkDuplicates  Read   676,000,000 records.  Elapsed time: 02:05:33s.  Time for last 1,000,000:    9s.  Last read position: chr8:46,856,120
INFO    2021-06-28 13:46:45 MarkDuplicates  Tracking 6757777 as yet unmatched pairs. 282813 records in RAM.
INFO    2021-06-28 13:46:50 MarkDuplicates  Read   677,000,000 records.  Elapsed time: 02:05:38s.  Time for last 1,000,000:    5s.  Last read position: chr8:48,968,815
INFO    2021-06-28 13:46:50 MarkDuplicates  Tracking 6752661 as yet unmatched pairs. 253411 records in RAM.

I have tried switching to the more recent gatk MarkDuplicatesSpark command, but it failed again a few hours in the analysis without any error message.

The input bam file has no errors according to Picard ValidateSamFile.

EDIT: the problem has now been encountered with multiple distinct bam files.

Does anybody have any suggestion as to what I could do to pinpoint the problem?

Thanks!

picard markduplicates bam gatk • 2.3k views
ADD COMMENT
0
Entering edit mode

How much memory do you have available? Is this WGS?

ADD REPLY
0
Entering edit mode

Yes, it is Human WGS. 680 GB on the local harddisk and 48 GB RAM.

ADD REPLY
0
Entering edit mode

Hi Jordi. This looks completely normal to me. What you posted is information that MarkDuplicates writes into the console. As you can see on the beginning of each line, with the word INFO.

Best,

Jordi Planells

ADD REPLY
0
Entering edit mode

Hi Jordi,

Yes, this would be the normal console output. The problem is that it interrupts and does not finish with the command. Also, there is no output file.

ADD REPLY
0
Entering edit mode

Can it be that your .bam file is truncated? Post the error message you get in the end, when the program crashes. From what you have posted there is little that we can say.
EDIT: Truncated .bam files occur when you run out of space in your hard disk. Since your data seems pretty big, this might be the case.

ADD REPLY
0
Entering edit mode

The problem is that it does not report any error message; the line I posted above are the last console output the command prints. I am running a ValidateSamFile right now on the input file, but I doubt it as I used Picards tools to create it and they all run without errors/warnings.

I think it is a problem of my virtual machine resources, but this would be strange as I run other commands (bwa mem, FastqToSam, MergeBamAlignment) that I imagine being at least as demanding as MarkDuplicates.

EDIT: ValidateSamFile found no errors in the input bam file. Would a truncated bam file be flagged as such by the tool?

ADD REPLY
0
Entering edit mode
2.3 years ago

I have the same problem. Solved with the command:

gatk --java-options "-Xmx24G -Xms24G -XX:ConcGCThreads=1 -Djava.io.tmpdir=/tmp" \ MarkDuplicatesSpark \ -I input.bam \ -O output.bam \ -M marked_metrics.txt \ --spark-master local[*] \ --conf 'spark.executor.cores=6'

Run on a cluster with 32G RAM and 8 cores.

ADD COMMENT

Login before adding your answer.

Traffic: 1852 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6