Hi,
I am running Picard's MarkDuplicates
command on a bam
file that I generated according to GATK's best practices guidelines.
The command has failed now multiple times at different stages without any error messages or warning. The last few lines of the last attempt are the following:
INFO 2021-06-28 13:43:06 MarkDuplicates Read 665,000,000 records. Elapsed time: 02:01:54s. Time for last 1,000,000: 5s. Last read position: chr8:30,463,617
INFO 2021-06-28 13:43:06 MarkDuplicates Tracking 6745737 as yet unmatched pairs. 344808 records in RAM.
INFO 2021-06-28 13:43:18 MarkDuplicates Read 666,000,000 records. Elapsed time: 02:02:06s. Time for last 1,000,000: 11s. Last read position: chr8:32,749,364
INFO 2021-06-28 13:43:18 MarkDuplicates Tracking 6744138 as yet unmatched pairs. 339607 records in RAM.
INFO 2021-06-28 13:43:23 MarkDuplicates Read 667,000,000 records. Elapsed time: 02:02:11s. Time for last 1,000,000: 5s. Last read position: chr8:35,027,712
INFO 2021-06-28 13:43:23 MarkDuplicates Tracking 6742550 as yet unmatched pairs. 334065 records in RAM.
INFO 2021-06-28 13:44:39 MarkDuplicates Read 668,000,000 records. Elapsed time: 02:03:27s. Time for last 1,000,000: 76s. Last read position: chr8:37,295,885
INFO 2021-06-28 13:44:39 MarkDuplicates Tracking 6740947 as yet unmatched pairs. 328694 records in RAM.
INFO 2021-06-28 13:44:47 MarkDuplicates Read 669,000,000 records. Elapsed time: 02:03:35s. Time for last 1,000,000: 7s. Last read position: chr8:39,537,579
INFO 2021-06-28 13:44:47 MarkDuplicates Tracking 6739367 as yet unmatched pairs. 323206 records in RAM.
INFO 2021-06-28 13:45:57 MarkDuplicates Read 670,000,000 records. Elapsed time: 02:04:45s. Time for last 1,000,000: 69s. Last read position: chr8:41,790,226
INFO 2021-06-28 13:45:57 MarkDuplicates Tracking 6737773 as yet unmatched pairs. 317727 records in RAM.
INFO 2021-06-28 13:46:03 MarkDuplicates Read 671,000,000 records. Elapsed time: 02:04:51s. Time for last 1,000,000: 5s. Last read position: chr8:43,092,877
INFO 2021-06-28 13:46:03 MarkDuplicates Tracking 7007382 as yet unmatched pairs. 583427 records in RAM.
INFO 2021-06-28 13:46:11 MarkDuplicates Read 672,000,000 records. Elapsed time: 02:04:59s. Time for last 1,000,000: 7s. Last read position: chr8:43,092,916
INFO 2021-06-28 13:46:11 MarkDuplicates Tracking 7094903 as yet unmatched pairs. 668562 records in RAM.
INFO 2021-06-28 13:46:22 MarkDuplicates Read 673,000,000 records. Elapsed time: 02:05:10s. Time for last 1,000,000: 11s. Last read position: chr8:43,094,783
INFO 2021-06-28 13:46:22 MarkDuplicates Tracking 7144551 as yet unmatched pairs. 715971 records in RAM.
INFO 2021-06-28 13:46:29 MarkDuplicates Read 674,000,000 records. Elapsed time: 02:05:17s. Time for last 1,000,000: 7s. Last read position: chr8:43,095,887
INFO 2021-06-28 13:46:29 MarkDuplicates Tracking 6905139 as yet unmatched pairs. 474456 records in RAM.
INFO 2021-06-28 13:46:35 MarkDuplicates Read 675,000,000 records. Elapsed time: 02:05:23s. Time for last 1,000,000: 6s. Last read position: chr8:43,820,929
INFO 2021-06-28 13:46:35 MarkDuplicates Tracking 6757332 as yet unmatched pairs. 293186 records in RAM.
INFO 2021-06-28 13:46:45 MarkDuplicates Read 676,000,000 records. Elapsed time: 02:05:33s. Time for last 1,000,000: 9s. Last read position: chr8:46,856,120
INFO 2021-06-28 13:46:45 MarkDuplicates Tracking 6757777 as yet unmatched pairs. 282813 records in RAM.
INFO 2021-06-28 13:46:50 MarkDuplicates Read 677,000,000 records. Elapsed time: 02:05:38s. Time for last 1,000,000: 5s. Last read position: chr8:48,968,815
INFO 2021-06-28 13:46:50 MarkDuplicates Tracking 6752661 as yet unmatched pairs. 253411 records in RAM.
I have tried switching to the more recent gatk MarkDuplicatesSpark
command, but it failed again a few hours in the analysis without any error message.
The input bam
file has no errors according to Picard ValidateSamFile
.
EDIT: the problem has now been encountered with multiple distinct bam
files.
Does anybody have any suggestion as to what I could do to pinpoint the problem?
Thanks!
How much memory do you have available? Is this WGS?
Yes, it is Human WGS. 680 GB on the local harddisk and 48 GB RAM.
Hi Jordi. This looks completely normal to me. What you posted is information that MarkDuplicates writes into the console. As you can see on the beginning of each line, with the word
INFO
.Best,
Jordi Planells
Hi Jordi,
Yes, this would be the normal console output. The problem is that it interrupts and does not finish with the command. Also, there is no output file.
Can it be that your .bam file is truncated? Post the error message you get in the end, when the program crashes. From what you have posted there is little that we can say.
EDIT: Truncated .bam files occur when you run out of space in your hard disk. Since your data seems pretty big, this might be the case.
The problem is that it does not report any error message; the line I posted above are the last console output the command prints. I am running a ValidateSamFile right now on the input file, but I doubt it as I used Picards tools to create it and they all run without errors/warnings.
I think it is a problem of my virtual machine resources, but this would be strange as I run other commands (
bwa mem
,FastqToSam
,MergeBamAlignment
) that I imagine being at least as demanding asMarkDuplicates
.EDIT:
ValidateSamFile
found no errors in the inputbam
file. Would a truncatedbam
file be flagged as such by the tool?