I use picard software to mark duplicates, and here is my command:
java -d64 -server -XX:+UseParallelGC -XX:ParallelGCThreads=2 -Xms8g -Xmx16g -Djava.io.tmpdir=tmp -jar ./picard.jar MarkDuplicates I=input.bam O=out_markdup.bam METRICS_FILE=out.metrics ASO=coordinate VALIDATION_STRINGENCY=LENIENT
It works well but when the input.bam
file gets bigger, the speed is very slow! I found that the picard MarkDuplicates doesn't support multiple threads. So, is there anyway to speedup picard? Another way, is there any better software to do the same as picard MarkDuplicates but with less time? I know elprep is another choice, but it needs very large memory!
Besides, I found that samtools can also remove duplicates, but according to my search, samtools can not remove the duplicates cross different chromosomes, so picard is more universe.
Any reply will be much appreciated!
MarkDuplicates supports multiple GC threads..