I've been using picard's MarkDuplicates tool for a while, but recently I encountered a problem. The program got stuck at some step. I used the option METRICS_FILE and seems the program is not able to produce the metrics file (or it'd take way way too long to produce it). The input bam file is sorted, but I didn't use the option ASSUME_SORTED or ASSUME_SORT_ORDER. The program produced the output bam file (in about 4 hours), which is about 60GB (somehow slightly bigger than the input bam file) and seems complete (as checked with samtools quickcheck), but it didn't produce the metrics file after over 24 hours or even longer (I can see java's still consuming CPUs). The following are the last two lines of the log output (with option -Xmx80g). I've tried different versions, 2.3, 2.5, 2.18 and with different memory allocations (as high as 200GB), but got the same result. Anyone have any ideas what's going on? Thanks for any help.
INFO 2018-03-23 22:21:29 MarkDuplicates Before output close freeMemory: 71000711328; totalMemory: 71578943488; maxMemory: 76355207168
INFO 2018-03-23 22:21:29 MarkDuplicates After output close freeMemory: 71000187040; totalMemory: 71578419200; maxMemory: 76355207168
It turned out that the picard program has a bug. See the bug report I just sent from the link below.
bug report
Since there is no solution for it yet I moved your post to a comment. Once the problem is resolved please come back to this thread and post the solution here.
Thanks for following up. The Broad Institute should respond to the bug report fairly quickly. Thanks, Kevin.
Does it work if you apply your suggested code?
Just a recommendation: if it's sorted, then let Picard know that.
Is the BAM output 'legit', i.e., does it have a EOF marker?
Does it even create the metrics file and start writing to it or does it just hang?
Your BAM files are large... what data is this?
Thanks for your reply. The data is WGS (so they are large) and the BAM files are complete as checked with samtools quickcheck. By debugging the source codes, I have located the source of the problem. It's a bug in the program and I've reported. You can see the bug report here.