I'm having a problem calling SNPs from a sorted bam file with GATK HaplotypeCaller. The file is a merged file of two technical replicates that were each sequenced on different lanes and aligned to the reference genome. When I run Picard ValidateSamFile on the bam file, I got the following error: "ValidateSamFile Value was put into PairInfoMap more than once." I also ran ValidateSamFile on the intermediate sam file for this sample (before its was converted to bam and sorted), and ValidateSamFile did not give me that error message when ran on the sam file. So, it appears that during the step where the sam file was converted into a bam file and sorted with samtools sort, something about read group information was disrupted.
Has anyone encountered this before, or have an ideas as to what would cause this issue? I tried aligning this sample to the reference genome again using the -M tag in bwa-mem (initially I didn't use the -M tag), but this issue wasn't resolved. I haven't been able to find much information about this error, other than suggestions to run Picard AddOrReplaceReadGroups to rename read group information. I'm will likely try this next but I'm going to need to call SNPs on a handful of files like this, so if possible I'd to figure out and fix what the underlying problem is before I resort to replacing the read group information for each sample.
Thank you in advance for your help!
cross-posted to another site: https://gatk.broadinstitute.org/hc/en-us/community/posts/360077152632-Error-ValidateSamFile-Value-was-put-into-PairInfoMap-more-than-once-
this is the first part of the message. What is the second part ? ( https://github.com/samtools/htsjdk/blob/master/src/main/java/htsjdk/samtools/CoordinateSortedPairInfoMap.java#L132 )
Thanks for your reply. Unfortunately, there's is not another part to the error message. After the "ValidateSamFile Value was put into PairInfoMap more than once" there is a "1:" followed by a 31 digit sequence and then things like the date/time and a finished message are output.
please run
what is the output ?
It found 6 matches:
where is the "the-31-digit-sequence" in the first column ??
I didn't copy the first column with that information- I didn't know that sequence is important, but it was there in the first column. Would that help to figure out what the issue is?
It was A00975:57:HHGM7DRXX:2:2122:1922:36949 for each of the 6 rows
this doesn't look like a SAM file; For example the cigar string 73M78S should be in the 6th column while your example it's in the 5th column.