Hello! :)
I have a BAM file that I need to modify.
1) delete the non-standard chromosomes (chrUn_gl..., chr_1_.._random etc.)
2) convert chr1 -> 1 , chr2 -> 2 .....
I do this:
samtools view -L subset.bed -b test.bam > test_subset.bam
samtools view -H test_subset.bam | sed 's/chr//' | sed '/M/d;/random/d;/Un/d;/[0-9]*_/d' | samtools reheader - test_subset.bam > test_subset_reheader.bam
.
Reheader option changes the content of the file. Chromosome number shifts like this:
First line test_subset.bam :
A1234:10:H7F2NDSXX:1:1234:1234:1234 147 chr1 9996 0 1234XXXX = 12345 -10
First line test_subset_reheader.bam :
A1234:10:H7F2NDSXX:1:1234:1234:1234 147 2 9996 0 1234XXXX = 12345 -10
.
BUT IT SHOULDN'T BE 2 THERE, IT SHOULD BE 1:
A1234:10:H7F2NDSXX:1:1234:1234:1234 147 1 9996 0 1234XXXX = 12345 -10
.
.
What do you think that it gets shifted? Maybe because of truncating the chrM below?
.
Header test_subset.bam :
@SQ SN:chrM LN:1234
@SQ SN:chr1 LN:1234
@SQ SN:chr2 LN:1234
@SQ SN:chr3 LN:1234
...
...
@SQ SN:chrX LN:1234
@SQ SN:chrY LN:1234
...
...
@SQ SN:chr1_gl000191_random LN:1234
@SQ SN:chr1_gl000192_random LN:1234
...
...
@SQ SN:chrUn_gl000249 LN:1234
Header test_subset_reheader.bam :
@SQ SN:1 LN:1234
@SQ SN:2 LN:1234
@SQ SN:3 LN:1234
@SQ SN:4 LN:1234
@SQ SN:5 LN:1234
..
..
@SQ SN:X LN:1234
@SQ SN:Y LN:1234
It sounds like you have a BAM file that you need to realign to a different reference genome. Alt vs no-alt reference have different alignment behaviours and reheadering will give you different result than you would get from realigning to the 'correct' reference.