Improving Performance Of Picard For Markduplicates
2
8
Entering edit mode
13.5 years ago
Brett Mccann ▴ 80

I'm trying to improve the performance of MarkDuplicates when processing a BAM file. I am running on a 12 core box with 64GB of RAM. I have been using the following picard command on my BAM file:

/usr/bin/java -Xmx10g -XX:-UseGCOverheadLimit -jar $PICARD_HOME/picard-1_42/MarkDuplicates.jar METRICS_FILE=rmdup_metrics.txt COMPRESSION_LEVEL=1 INPUT=merged.bam OUTPUT=dedup_clpc.bam REMOVE_DUPLICATES=True ASSUME_SORTED=True VALIDATION_STRINGENCY=LENIENT

Are there any threading options that might increase performance? I also tried indexing the BAM file using samtools, prior to running MarkDuplicates with this command:

$SAM_TOOLS_HOME/samtools index merged.bam

which resulted in a 'merged.bam.bai' file. But this had no performance impact.

Are there any other options for pre-processing the BAM file that might impact performance of MarkDuplicates?

picard samtools markduplicates • 14k views
ADD COMMENT
0
Entering edit mode

Out of curiosity, why are you using ASSUME_SORTED? I had problems whereby MarkDuplicates wouldn't recognise a samtools sorted file as being sorted. The problem disappeared when I sorted with Picard SortSam instead. I don't have to use the lenient validation anymore either.

ADD REPLY
3
Entering edit mode
13.5 years ago

The setting, -XX:ParallelGCThreads, is just for garbage collection. It won't really affect MArkDuplicates unless MarkDups hits it's top memory usage (The -Xmx setting).

As for -XX:-UseGCOverheadLimit it will just make MarkDups die faster if the Xmx wasn't set high enough.

They are all java hotspot switches, not MarkDups specific switches.

I've tries to set Xmx at 4,10,40,60,128G modifying MAX_READS_IN_RAM at the same time by the same factor (4G == 150000, the default if I remember).

It does make a difference, but it's not substancial even when running over nfs.

ADD COMMENT
0
Entering edit mode
13.5 years ago
Docroberson ▴ 310

Since you're assuming sorted make sure the file is sorted first, but sounds like you're already doing that. What does your core usage look like? Are you using all 12? I don't remember how many cores MarkDups will use. You can try this to see if it helps:

-XX:ParallelGCThreads=12

The default should be to utilize all cores so I don't know that this will help, but may be worth a try. You can try increasing memory but really 10G should be plenty.

You could also try increasing the allocated memory limit and increasing the SORTING_COLLECTION_SIZE_RATIO from the 0.25 default. If you get too close to the memory limit it will probably cause it to start spilling to swap, which won't help anything. If you have success update us on what worked the best.

ADD COMMENT

Login before adding your answer.

Traffic: 1070 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6