Split the bam file by chromosome and speed up picard Markduplicates
1
1
Entering edit mode
7.5 years ago
lghust2011 ▴ 110

I use picard to mark duplicates and found that picard dose not support multiple threads and it's very slow. To speed up it, I want to split BAM file by chromosome and then run picard on every file. The problem is, I know that picard has an advantage over samtools rmdup because picard can mark cross-chromosome duplicates. So if I split the bam file by chromosome, how important will it influence the result? Here is my consideration:

A pair of reads must come from the same DNA fragment, so these two reads mapped to the same chromosome normally. But at sometimes, these two reads mapped to different chromosome, maybe there is a structure variation or repeat such as microsatellite? If I just want to call SNV and indel, may I ignore the cross-chromosome duplicates? Please let me know if there is anything wrong with my consideration. Any reply will be much appreciated!

markduplicates next-gen sequencing alignment • 3.2k views
ADD COMMENT
0
Entering edit mode

Another way, if the influence is important, how can I compensate it?

ADD REPLY
0
Entering edit mode

You can alternatively use Clumpify, which does duplicate-marking or duplicate-removal prior to mapping and is extremely fast.

ADD REPLY
1
Entering edit mode
7.5 years ago

you could split your bam by both-mapped-chr1.bam, both-mapped-chr2.bam , both-mapped-chr3.bam , (..), and 'others.bam'

howeve I don't know if creating those new bams will reduce the computing time.

ADD COMMENT

Login before adding your answer.

Traffic: 3002 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6