Entering edit mode
3 months ago
selplat21
▴
20
I have about 20,000 bam files in a directory called intermediate_bams
These will all have the same @HD and @SQ lines. There are about 860 @SQ lines.
When I try to merge these bams, I get the following output:
samtools merge -h custom_header.sam -o test.bam intermediate_bams/*bam
[E::bam_hdr_write] Header too long for BAM format
Here is the header and tail of my custom_header.sam
@HD VN:1.6 SO:coordinate
@SQ SN:NC_088602.1 LN:212386202
@SQ SN:NC_088603.1 LN:163726572
@SQ SN:NC_088604.1 LN:122092291
@SQ SN:NC_088605.1 LN:78855516
...
@SQ SN:NW_027043814.1 LN:173419
@SQ SN:NW_027043815.1 LN:151889
@SQ SN:NW_027043816.1 LN:151339
@SQ SN:NW_027043817.1 LN:234180
@SQ SN:NW_027043818.1 LN:593964
The output of wc -l custom_header.sam is
870
and the output of command ls -lh custom_header.sam is
-rw-r--r-- 1 nicolas nicolas 27K Aug 2 21:02 custom_header.sam
The large number corresponds to the large number of scaffolds in my genome. I've never run into this problem before with previous genomes.
That header should not cause a problem. Please add a comment on the issue I have linked to.