I have around 20 BAM files and each of them have proper bam header including RG ID, Sample, Library, Platform unit, Description and Platform. I was reading the GATK manual and just realized that the only accepted names for platforms are 454, LS454, Illumina, Solid, ABI_Solid, and CG. But I wanted to be quiet specific when I was converting SAM to BAM and used "Solid5500XL" as the platform name.
I know there are AddorReplaceReadgroups and ReplaceSamHeader coomand in Picard which I can use to edit the BAM header. But it will take some time as I have many large files. Is there an easy way to do so without rewriting the BAM files.
Thanks All. Somebody on the other forum told me that there is no way you can quickly edit minor info.
But samtools reheader option incorporates the change you want to do in BAM file very quickly. It is same as the time required to copy your bam file from one place to another.
I think you can do this with samtools, first extract the header into a text file, then correct the value and then change the BAM header. Something like this:
for BAM in *.bam
do
samtools view -H $BAM > header.sam
sed "s/Solid5500XL/Solid/" header.sam > header_corrected.sam
samtools reheader header_corrected.sam $BAM
done
I haven't tested it but I think it must work for you. Please let me know if that works (who knows, maybe some day I'll need it).
ADD COMMENT
• link
updated 2.8 years ago by
Ram
44k
•
written 12.5 years ago by
JC
13k
10
Entering edit mode
yes, I agree this should work. and considering that samtools allows piping, I would rewrite the 3 lines inside the DO loop to get rid of intermediate header.sam files simply as follows:
really clever, avoiding the temporary files, like it
ADD REPLY
• link
updated 2.3 years ago by
Ram
44k
•
written 12.5 years ago by
JC
13k
1
Entering edit mode
I just wanted to add that i was getting a segmentation fault (crash) using samtools reheader in a similar way but it seems to be fixed when I upgraded samtools to version 0.1.19
ADD REPLY
• link
updated 4.2 years ago by
Ram
44k
•
written 11.1 years ago by
cmdcolin
★
4.0k
0
Entering edit mode
I got this knowledge directly from a Jeremy Leipzig's answer quite a long ago, which actually opened my mind at that time when I was starting to extensively work with bam files: A: How Can I Edit Some Rows In .Bam Header File?
What if I want edit the header @RG of my sample in the bam files I merged, will it work safely?
Because I have finished recalibration two times but in wrong header, I can't present multiple samples in a vcf file.
I agree, the method worked fine for me, but I had to redirect the output of the samtools/reheader command to a newly created bam file, otherwise the file with the correct header is printed to the screen and then the remote terminal crashed:
ADD REPLY
• link
updated 2.8 years ago by
Ram
44k
•
written 7.5 years ago by
alanh
▴
170
0
Entering edit mode
Please forgive me if I am wrong, but I don't think this answers the original question. samtools reheader will simply rewrite the BAM file after first writing the header. This means copying each BAM file all the way through. For large BAM files (it is not uncommon for me to work with 200GB files) this is unfeasible.
ADD REPLY
• link
updated 2.8 years ago by
Ram
44k
•
written 10.1 years ago by
jhrf
▴
10
0
Entering edit mode
It is simply impossible to edit the BAM header only. Reheader is the best shot. It is by far faster than other alternatives.
ADD REPLY
• link
updated 2.8 years ago by
Ram
44k
•
written 10.1 years ago by
lh3
33k
0
Entering edit mode
Thanks for this. Do you know if the original .bai will work on the new BAM (specifically I'm changing the sequence names, so I guess not...).
ADD REPLY
• link
updated 5.1 years ago by
Ram
44k
•
written 8.9 years ago by
Dan
▴
540
I was thinking of SAM files. I still think the best solution here is to modify the SAM files and redo the BAM conversion, because the allowed platform (PL) values seem to be hard-coded in GATK.
Thanks All. Somebody on the other forum told me that there is no way you can quickly edit minor info.
But
samtools reheader
option incorporates the change you want to do in BAM file very quickly. It is same as the time required to copy your bam file from one place to another.Yes, this is JC's answer.