How to remove duplicates in a sorted bam file using picard?
1
0
Entering edit mode
8.4 years ago
GK1610 ▴ 120

I have a sorted paired end bam file

I want to remove all paired end duplicates

-------->                <----------
-------->                <----------

NOT this kind

-------->                              <----------
-------->                <----------

here is what i did

java -jar -Xmx16g ~/picard/1.68/bin/picard-tools-1.68/MarkDuplicates.jar I=test.sorted.bam O=test.remove.duplicates.bam M=~/test.DupMetrics.txt REMOVE_DUPLICATES=true VALIDATION_STRINGENCY=LENIENT

I am getting this error

INFO    2016-07-24 17:31:43 MarkDuplicates  Start of doWork freeMemory: 2014511032; totalMemory: 2025979904; maxMemory: 15271002112
INFO    2016-07-24 17:31:43 MarkDuplicates  Reading input file and constructing read end information.
INFO    2016-07-24 17:31:43 MarkDuplicates  Will retain up to 60599214 data points before spilling to disk.
INFO    2016-07-24 17:31:53 MarkDuplicates  Read 1000000 records. Tracking 7879 as yet unmatched pairs. 752 records in RAM.  Last sequence index: 0
INFO    2016-07-24 17:32:01 MarkDuplicates  Read 2000000 records. Tracking 16466 as yet unmatched pairs. 1245 records in RAM.  Last sequence index: 0
INFO    2016-07-24 17:32:10 MarkDuplicates  Read 3000000 records. Tracking 24087 as yet unmatched pairs. 1610 records in RAM.  Last sequence index: 0
INFO    2016-07-24 17:32:16 MarkDuplicates  Read 4000000 records. Tracking 32294 as yet unmatched pairs. 1716 records in RAM.  Last sequence index: 0
INFO    2016-07-24 17:32:24 MarkDuplicates  Read 5000000 records. Tracking 38993 as yet unmatched pairs. 1676 records in RAM.  Last sequence index: 0
[Sun Jul 24 17:32:25 EDT 2016] net.sf.picard.sam.MarkDuplicates done. Elapsed time: 0.70 minutes.
Runtime.totalMemory()=5749997568
FAQ:  http://sourceforge.net/apps/mediawiki/picard/index.php?title=Main_Page
Exception in thread "main" net.sf.picard.PicardException: Value was put into PairInfoMap more than once.  1: HBCC_ACC_382_C:K00225:15:H3VK7BBXX:1:1123:15524:31804
    at net.sf.picard.sam.CoordinateSortedPairInfoMap.ensureSequenceLoaded(CoordinateSortedPairInfoMap.java:124)
    at net.sf.picard.sam.CoordinateSortedPairInfoMap.remove(CoordinateSortedPairInfoMap.java:78)
    at net.sf.picard.sam.DiskReadEndsMap.remove(DiskReadEndsMap.java:61)
    at net.sf.picard.sam.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:343)
    at net.sf.picard.sam.MarkDuplicates.doWork(MarkDuplicates.java:122)
    at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:177)
    at net.sf.picard.sam.MarkDuplicates.main(MarkDuplicates.java:106)
ChIP-Seq • 3.8k views
ADD COMMENT
0
Entering edit mode

It works.

Thanks :)

ADD REPLY
0
Entering edit mode

Hi,

I suffered same error wuth you, Can you tell me how to figure out the "Exception in thread "main" htsjdk.samtools.SAMException: Value was put into PairInfoMap more than once."

Thanks a lot!

ADD REPLY
1
Entering edit mode
8.4 years ago

you version of picard is just too old. https://sourceforge.net/projects/picard/files/picard-tools/1.68/ was released 4 years ago .

current version is 2.5 https://github.com/broadinstitute/picard/releases/tag/2.5.0

ADD COMMENT

Login before adding your answer.

Traffic: 2010 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6