Markduplicates Error
2
0
Entering edit mode
12.7 years ago

Hello,

I'm processing whole genome BAM files. Since I'm specifically interested on chromosome 11, I have split my files and I'm working only on this chromosome. However, when I tried to run MarkDuplicates on chr11 bam files, it gave the following error:

SAM validation error: WARNING: Record 23, Read name IL21_1665:3:25:467:1485, Paired read should be marked as first of pair or second of pair.

Running it on ValidateSamFile produced hundreds of warnings with the same information. It also occurred with other Picard tools, such as FixMateInformation. At first, I thought the problem should be related to inter chromosomal pairs, where the information for one of the reads is not present on my bam file. Then, I saw this answered on Picard's FAQ page:

"If your reads have been divided into separate BAMs by chromosome, inter-chromosomal pairs will not be identified, but MarkDuplicates will not fail due to inability to find the mate pair for a read."

Right now, I'm confused and I don't know how to solve this. Should I run MarkDuplicates on the whole-genome file?

Thanks

markduplicates next-gen sequencing picard • 4.4k views
ADD COMMENT
2
Entering edit mode
12.7 years ago

Your best bet is to set VALIDATION_STRINGENCY to either SILENT or LENIENT. This will not likely affect the correctness of the results.

ADD COMMENT
1
Entering edit mode
12.7 years ago
Swbarnes2 ★ 1.6k

It looks like the software that made your .bam made the flags wrong. It looks like Picard is complainnig that your .bam entries have the 1 flagged, but not 64 or 128.

But as long as your read names are identical between the two reads, Picard might still be able to figure out that they are paired, and will still know which pairs have the same coordiantes.

And yes, you can run Picard with VALIDATION_STRINGENCY set to LENIENT, and it will likely do its thing despite that problem.

ADD COMMENT
0
Entering edit mode

Thanks for your answers. I think it worked when setting VALIDATION_STRINGENCY to LENIENT. Anyway I don't understand why I'm having this problem. I'm using Bam files aligned and produced by Sanger Institute and I'm processing them on my own pipeline to call SNPs. Steps leading to Bam file production should be OK, right? But these Bams don't even pass the ValidateSamFile!

ADD REPLY
0
Entering edit mode

Look at a little of the orignal .sam file with your eyeballs. What flags do you see? Is your pipeline possibly changing those flags?

ADD REPLY

Login before adding your answer.

Traffic: 2595 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6