Hi,
I have used bwa mem to map paired end fastq files to a ref genome and the resulting SAM file has duplicated @SQ lines in the SequenceDictionary:
bwa mem -t -M 4 ${REF_DIR}${ref} ${DATA_DIR}${p1} ${DATA_DIR}${p2} > ${sam}
The resulting SAM file generates errors using Picard Tools:
Exception in thread "main" java.lang.IllegalArgumentExceptioannot add sequence that already exists in SAMSequenceDictionary: chr1n: C
As well as converting the SAM to BAM.
[W::sam_hdr_parse] Duplicated sequence 'JTFH01001905.1'
[W::sam_hdr_parse] Duplicated sequence 'JTFH01001906.1'
[W::sam_read1] Parse error at line 29991
[main_samview] truncated file.
How can I replace the SAM header to remove duplicated @SQ lines?
I see there is an old post here: Duplicate @Sq Lines In Sam File Header that suggests the issue can be resolved usign the bwa mem -M argument. I am wondering if there is way to modify the SAM without remapping, as this is a large SAM file.
Thank you!
samtools quickcheck
will check for a valid header in your SAM. Also maybesamtools reheader
could be useful? Here's the docsThanks, yes, the SAM passes samtools quickcheck and unfortunately, samtools reheader only takes input BAM files.