1) Renaming contigs
If the order isn't changing you can use samtools reheader
(http://www.htslib.org/doc/samtools-reheader.html). This works because BAM files don't actually store the contig names, but the contig indices.
Picard ReplaceSamHeader
should do a similar thing.
Caveat: I'm not aware of any tool that correctly updates SAM tags such as the SA when renaming contigs. Your split reads will be broken.
WARNING: samtools reheader
only works if the contigs you are removing are the final contigs. Internally, BAM uses the index (not the name) of the contig each read aligns to. Changing [chr1, chr1_Un,chr2]
to [1,2]
deletes "chr2" and remaps all "chr1_Un" reads to "2". It does not rename "chr2" aligned reads to "2".
2) Delete additional contigs
It's unclear what you want to do with reads that are mapped to contigs to delete. Do you want to:
- Remove them from the file:
samtools view -L to_retain.bed
should do the trick.
Note that this will also filter out unmapped reads.
If you want to keep unmapped reads you need to flip it and use -U
: samtools view -L to_remove.bed -U output.bam > /dev/null
You'll need to set the unmapped SAM flag and remove the contig alignment.
Same caveat: I'm not aware of any tool that correctly updates SAM tags such as the SA when renaming contigs. Your split reads will be broken.
TLDR: You're probably better off realigning to the new reference.
see Bam File: Change Chromosome Notation
I wrote http://lindenb.github.io/jvarkit/ConvertBamChromosomes.html
java -jar dist/bamrenamechr.jar -f mapping.tsv in.bam
Thank you very much for your reply! However, I still do not understand the logic of the whole thing. I learned that I can subset the bam file by using a bed file, in order to get rid of the non-standard chromosome, but then when I want to update the header, I both delete non-standard chr and convert chr1->1
samtools view -H test.bam | sed 's/chr//' | sed '/M/d;/random/d;/Un/d;/[0-9]*_/d'
,but It breaks the content of the BAM file when I use rehader: .
I do not really get how the order of editing should be.
I strongly recommend trying it on a tiny BAM file (e.g. with 2 reads).
Be aware that deleting every header with the letter M or any underscore (since
[0-9]*
happily matches nothing) is likely to delete more headers than you intended to.Note that since BAM (not SAM) uses reference contig index, not reference contig name,
samtools reheader
gives incorrect results if you drop any@SQ
headers that aren't at the end of the@SQ
block. For example:to
is ok.
Both
to
doesn't work at all with
reheader
because it's equivalent to dropping chr2 and renamed chrUn_decoy to chr2. It does not rename chr2 to 2.Edit: when I tried your command on one of my bams I got:
This was a result of your
sed
deleting one of the@PG
headers.