Bam header editing
1
1
Entering edit mode
5.8 years ago
Huynh Nguyen ▴ 10

Dear all,

I have 100 .bam files and I would like to change all their header with a new header. The new header is the same for all files. The only one thing different is in @RG: ID="bam file name" and SM="bam file name". How can I do this step instead of reheadering one by one?

Thank you all for any help.

bam header • 5.1k views
ADD COMMENT
0
Entering edit mode

Hello,

are there already ReadGroup Information in the header (samtools view -H input.bam|grep "@RG")? If so, should they get replaced?

Why do you want the filename as the ID and SampleName?

fin swimmer

ADD REPLY
0
Entering edit mode

Can you do something along the lines

samtools view -H in.bam | awk 'BEGIN { FS = OFS = "\t"; } {if 
($1 == "@SQ") { gsub("SN:", "SN:chr", $2); print $1, $2, $3; }
else print; }' | samtools reheader - in.bam > out.bam

where in this case awk is used to replace chromosome names from 1, 2, ... notation to chr1, chr2, ... notation.

Cheers, Thomas

ADD REPLY
0
Entering edit mode
ADD REPLY
2
Entering edit mode
5.8 years ago

Sounds like what you need is to replace a readgroup where you overwrite all alignments with a new readgroup:

samtools addreplacerg

prints:

Usage: samtools addreplacerg [options] [-r <@RG line> | -R <existing id>] [-o <output.bam>] <input.bam>

Options:
  -m MODE   Set the mode of operation from one of overwrite_all, orphan_only [overwrite_all]
  -o FILE   Where to write output to [stdout]
  -r STRING @RG line text
  -R STRING ID of @RG line in existing header to use
      --input-fmt FORMAT[,OPT[=VAL]]...
               Specify input format (SAM, BAM, CRAM)
      --input-fmt-option OPT[=VAL]
               Specify a single input file format option in the form
               of OPTION or OPTION=VALUE
  -O, --output-fmt FORMAT[,OPT[=VAL]]...
               Specify output format (SAM, BAM, CRAM)
      --output-fmt-option OPT[=VAL]
               Specify a single output file format option in the form
               of OPTION or OPTION=VALUE
      --reference FILE
               Reference sequence FASTA FILE [null]
  -@, --threads INT
               Number of additional threads to use [0]

now if you really don't want to process the entire BAM file (and note that any text-based editing means turning into SAM then back to BAM and would probably be slower than addreplacerg) you could edit the BAM file directly, though with that you could easily corrupt the files if done incorrectly. Here is how a BAM format starts:

magic   BAM magic   string char[4] 
l_text    Length of the header text, including any NUL padding int32 
text      Plain header text in SAM; not necessarily NUL-terminated char

now edit the l_text and text while shifting the file as needed.

you are probably better off with addreplacerg

ADD COMMENT
0
Entering edit mode

I ended up using addreplacerg, and wrapped it in a fun loop..

for file in PATH/TO/INPUTS/*.bam; do
base_name=$(basename $file .bam);
samtools addreplacerg -r "ID:${base_name}\tSM:${base_name}" -o PATH/TO/OUTPUT/${base_name}.bam $file;
done
ADD REPLY

Login before adding your answer.

Traffic: 2075 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6