High, I'm having a problem using PicardTools MarkDuplicates. I had two fastq files for forward and reverse reads. They were both from the same lane, etc. I aligned them with bwa mem using the -M option and got the bam file. I used PicardTools FixMateInformation, SortSam, and samtools index. When I use MarkDuplicates, I get this error:
Exception in thread "main" net.sf.picard.PicardException: Value was put into PairInfoMap more than once. 4: Sample_AD_096:HWI-ST1341:97:C7CKRACXX:8:2201:11012:61476 at net.sf.picard.sam.CoordinateSortedPairInfoMap.ensureSequenceLoaded(CoordinateSortedPairInfoMap.java:124) at net.sf.picard.sam.CoordinateSortedPairInfoMap.remove(CoordinateSortedPairInfoMap.java:78) at net.sf.picard.sam.DiskReadEndsMap.remove(DiskReadEndsMap.java:61) at net.sf.picard.sam.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:407) at net.sf.picard.sam.MarkDuplicates.doWork(MarkDuplicates.java:150) at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:177) at net.sf.picard.sam.MarkDuplicates.main(MarkDuplicates.java:134) [E::hts_open] fail to open file 'Sample_AD_096_b37_aln-pe.bam'
When I grepped the read name, I got 4 lines back: Primary forward read, primary reverse read, separated by a SAMFlag value of 80 or 16. Secondary forward read, secondary reverse read separated by a SAMFlag value of 80 or 16.
HWI-ST1341:97:C7CKRACXX:8:2201:11012:61476 131 2 233666062 6 0 35S66M 5 154316645 0 CCCATCGTCGGAGCTGGTGGCACAGAGGTTGT ACTGACCCTGTCTCTACAAATAATAAAAAAGCCAGGCATGGTGGTATGCACCTGTAGTCCCAGCTACTT BBBFFFFF FFFFFIFFFBFFFFFIFBFFBFFFFFIFFFFFII<ffffiiifffiiiiiiffffffbbbbffffbbbbbbfffffffff ffffff<bbfffb="" sa:z:5,154316645,+,39m62s,60,0;="" md:z:66="" rg:z:sample_ad_096="" n="" m:i:0="" mq:i:60="" as:i:66="" xs:i:44<="" strong="">
HWI-ST1341:97:C7CKRACXX:8:2201:11012:61476 115 2 233666062 6 0 32S69M 5 154316648 0 ATCGTCGGAGCTGGTGGCACAGAGGTTGTACT GACCCTGTCTCTACAAATAATAAAAAAGCCAGGCATGGTGGTATGCACCTGTAGTCCCAGCTACTTGGG 77FFBFFF FFFFFFFFFFFFFBBBBBFFFFFFB7BBFIIIFFFFFBFFFFFFFIIFBFFFFFFIIIFBFIF<f<fbfbbfffff<fbf ffffffffffbbb="" sa:z:5,154316648,-,36m65s,60,0;="" md:z:69="" rg:z:sample_ad_096="" n="" m:i:0="" mq:i:60="" as:i:69="" xs:i:46<="" strong="">
HWI-ST1341:97:C7CKRACXX:8:2201:11012:61476 387 5 154316645 6 0 39M62H 2 233666062 0 CCCATCGTCGGAGCTGGTGGCACAGAGGTTGT ACTGACC BBBFFFFFFFFFFIFFFBFFFFFIFBFFBFFFFFIFFFF SA:Z:2,233666062,+,35S66M,60,0;M D:Z:39 RG:Z:Sample_AD_096 NM:i:0 MQ:i:60 AS:i:39 XS:i:0
HWI-ST1341:97:C7CKRACXX:8:2201:11012:61476 371 5 154316648 6 0 36M65H 2 233666062 0 ATCGTCGGAGCTGGTGGCACAGAGGTTGTACT GACC 77FFBFFFFFFFFFFFFFFFFBBBBBFFFFFFB7BB SA:Z:2,233666062,-,32S69M,60,0;M D:Z:36 RG:Z:Sample_AD_096 NM:i:0 MQ:i:60 AS:i:36 XS:i:0
I tried isolating primary alignments only, with samtools view -b -F 0x4 -F 0x100 -F 0x800 both.bam > primary.bam, but got the same error even though a grep showed only the primary forward and reverse reads separated by a value of 80 or 16. I did remember to index this. I don't know how to solve this. I have two fastq files so I can't use MergeBamAlignment. Should I combine the fastq files into one? My grep output from the primaries only:
HWI-ST1341:97:C7CKRACXX:8:2201:11012:61476 131 2 233666062 6 0 35S66M 5 154316645 0 CCCATCGTCGGAGCTGGTGGCACAGAGGTTGT ACTGACCCTGTCTCTACAAATAATAAAAAAGCCAGGCATGGTGGTATGCACCTGTAGTCCCAGCTACTT BBBFFFFF FFFFFIFFFBFFFFFIFBFFBFFFFFIFFFFFII<ffffiiifffiiiiiiffffffbbbbffffbbbbbbfffffffff ffffff<bbfffb="" sa:z:5,154316645,+,39m62s,60,0;="" md:z:66="" rg:z:sample_ad_096="" n="" m:i:0="" mq:i:60="" as:i:66="" xs:i:44<="" strong="">
HWI-ST1341:97:C7CKRACXX:8:2201:11012:61476 115 2 233666062 6 0 32S69M 5 154316648 0 ATCGTCGGAGCTGGTGGCACAGAGGTTGTACT GACCCTGTCTCTACAAATAATAAAAAAGCCAGGCATGGTGGTATGCACCTGTAGTCCCAGCTACTTGGG 77FFBFFF FFFFFFFFFFFFFBBBBBFFFFFFB7BBFIIIFFFFFBFFFFFFFIIFBFFFFFFIIIFBFIF<f<fbfbbfffff<fbf ffffffffffbbb="" sa:z:5,154316648,-,36m65s,60,0;="" md:z:69="" rg:z:sample_ad_096="" n="" m:i:0="" mq:i:60="" as:i:69="" xs:i:46<="" strong="">
Thank you. I think something weird happened with the assignments. The secondary reads seem to be assigned as mates to the primary reads.
Glad you figured it out.