Duplicate SAMSequenceDictionary in SAM file - how to modify SAM header?
1
1
Entering edit mode
6.4 years ago
kat ▴ 40

Hi,

I have used bwa mem to map paired end fastq files to a ref genome and the resulting SAM file has duplicated @SQ lines in the SequenceDictionary:

bwa mem -t -M 4 ${REF_DIR}${ref} ${DATA_DIR}${p1} ${DATA_DIR}${p2} > ${sam}

The resulting SAM file generates errors using Picard Tools:

Exception in thread "main" java.lang.IllegalArgumentExceptioannot add sequence that already exists in SAMSequenceDictionary: chr1n: C

As well as converting the SAM to BAM.

[W::sam_hdr_parse] Duplicated sequence 'JTFH01001905.1'
[W::sam_hdr_parse] Duplicated sequence 'JTFH01001906.1'
[W::sam_read1] Parse error at line 29991
[main_samview] truncated file.

How can I replace the SAM header to remove duplicated @SQ lines?

I see there is an old post here: Duplicate @Sq Lines In Sam File Header that suggests the issue can be resolved usign the bwa mem -M argument. I am wondering if there is way to modify the SAM without remapping, as this is a large SAM file.

Thank you!

sam bwa SequenceDictionary • 4.6k views
ADD COMMENT
0
Entering edit mode

samtools quickcheck will check for a valid header in your SAM. Also maybe samtools reheader could be useful? Here's the docs

ADD REPLY
0
Entering edit mode

Thanks, yes, the SAM passes samtools quickcheck and unfortunately, samtools reheader only takes input BAM files.

ADD REPLY
0
Entering edit mode
6.4 years ago

The clunky way to do it:

samtools view -H input.bam > header.sam

fix the header; it's plain text, then

samtools view input.bam | cat header.sam - | samtools view -Sb - > new.bam
ADD COMMENT
0
Entering edit mode

Nice I like it. Apparently the -S option is only needed for older versions of samtools though.

ADD REPLY
0
Entering edit mode

Thank you both! I was also running into issues with samtools view -b $sam > test.bam [W::sam_read1] Parse error at line 29991 [main_samview] truncated file.

I looked at the parse error lines and the @SQ header is duplicated within my SAM file. Is this a known error with bwa mem? Can I remove all these @SQ lines with awk and then use the SAM file?

ADD REPLY
0
Entering edit mode

Sounds like it's not just an issue with the header. When you run samtools view -b it ignores the header. The truncated file error you showed means there's actually something wrong with a mapped read entry. I think you should just remap the sequences, maybe try bowtie2 or STAR as alternatives to bwa.

Also it's possible your reference genome file is what caused the

Exception in thread "main" java.lang.IllegalArgumentExceptioannot add sequence that already exists in SAMSequenceDictionary: chr1n: C

ADD REPLY
0
Entering edit mode

Thank you - yes, remapping, I get the same issue with a duplicated SAM header. This time, I added the -M argument to bwa mem: bwa mem -t 16 -M ${REF_DIR}${ref} ${DATA_DIR}${p1} ${DATA_DIR}${p2} > ${sam}

Is this a known issue and is there something I can do to mitigate? Thank you!

@SQ SN:chr1 LN:248956422 @SQ SN:chr10 LN:133797422 @SQ SN:chr11 LN:135086622 @SQ SN:chr12 LN:133275309 @SQ SN:chr13 LN:114364328 @SQ SN:chr14 LN:107043718 @SQ SN:chr15 LN:101991189 @SQ SN:chr16 LN:90338345 @SQ SN:chr17 LN:83257441 @SQ SN:chr18 LN:80373285 @SQ SN:chr19 LN:58617616 @SQ SN:chr1_KI270706v1_random LN:175055 @SQ SN:chr1_KI270707v1_random LN:32032 @SQ SN:chr1_KI270708v1_random LN:127682 @SQ SN:chr1_KI270709v1_random LN:66860 @SQ SN:chr1_KI270710v1_random LN:40176 @SQ SN:chr1_KI270711v1_random LN:42210 @SQ SN:chr1_KI270712v1_random LN:176043 @SQ SN:chr1_KI270713v1_random LN:40745 @SQ SN:chr1_KI270714v1_random LN:41717 @SQ SN:chr11_KI270721v1_random LN:100316 @SQ SN:chr14_GL000009v2_random LN:201709 @SQ SN:chr14_GL000225v1_random LN:211173 @SQ SN:chr14_KI270722v1_random LN:194050 @SQ SN:chr14_GL000194v1_random LN:191469 @SQ SN:chr14_KI270723v1_random LN:38115 @SQ SN:chr14_KI270724v1_random LN:39555 @SQ SN:chr14_KI270725v1_random LN:172810 @SQ SN:chr14_KI270726v1_random LN:43739 @SQ SN:chr15_KI270727v1_random LN:448248 @SQ SN:chr16_KI270728v1_random LN:1872759 @SQ SN:chr17_GL000205v2_random LN:185591 @SQ SN:chr17_KI270729v1_random LN:280839 @SQ SN:chr17_KI270730v1_random LN:112551 @SQ SN:chr1 LN:248956422 @SQ SN:chr10 LN:133797422 @SQ SN:chr11 LN:135086622 @SQ SN:chr12 LN:133275309 @SQ SN:chr13 LN:114364328 @SQ SN:chr14 LN:107043718 @SQ SN:chr15 LN:101991189 @SQ SN:chr16 LN:90338345 @SQ SN:chr17 LN:83257441 @SQ SN:chr18 LN:80373285 @SQ SN:chr19 LN:58617616

ADD REPLY

Login before adding your answer.

Traffic: 2491 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6