Issue with samtools reheader, bam file
1
0
Entering edit mode
3.5 years ago
nrk_02 ▴ 10

Hello! :)

I have a BAM file that I need to modify.

1) delete the non-standard chromosomes (chrUn_gl..., chr_1_.._random etc.)

2) convert chr1 -> 1 , chr2 -> 2 .....

I do this:

samtools view -L subset.bed -b test.bam > test_subset.bam

samtools view -H test_subset.bam | sed 's/chr//' | sed '/M/d;/random/d;/Un/d;/[0-9]*_/d' | samtools reheader - test_subset.bam > test_subset_reheader.bam

.

Reheader option changes the content of the file. Chromosome number shifts like this:

First line test_subset.bam :

A1234:10:H7F2NDSXX:1:1234:1234:1234 147 chr1 9996 0 1234XXXX = 12345 -10

First line test_subset_reheader.bam :

A1234:10:H7F2NDSXX:1:1234:1234:1234 147 2 9996 0 1234XXXX = 12345 -10

.

BUT IT SHOULDN'T BE 2 THERE, IT SHOULD BE 1:

A1234:10:H7F2NDSXX:1:1234:1234:1234 147 1 9996 0 1234XXXX = 12345 -10

.

.

What do you think that it gets shifted? Maybe because of truncating the chrM below?

.

Header test_subset.bam :

@SQ     SN:chrM LN:1234
@SQ     SN:chr1 LN:1234
@SQ     SN:chr2 LN:1234
@SQ     SN:chr3 LN:1234

...

...

@SQ     SN:chrX LN:1234
@SQ     SN:chrY LN:1234

...
...

@SQ     SN:chr1_gl000191_random LN:1234
@SQ     SN:chr1_gl000192_random LN:1234

...
...


@SQ     SN:chrUn_gl000249       LN:1234

Header test_subset_reheader.bam :

@SQ     SN:1    LN:1234
@SQ     SN:2    LN:1234
@SQ     SN:3    LN:1234
@SQ     SN:4    LN:1234
@SQ     SN:5    LN:1234
..

..

@SQ     SN:X    LN:1234
@SQ     SN:Y    LN:1234
samtools chromosome BAM reheader notation • 2.1k views
ADD COMMENT
0
Entering edit mode

It sounds like you have a BAM file that you need to realign to a different reference genome. Alt vs no-alt reference have different alignment behaviours and reheadering will give you different result than you would get from realigning to the 'correct' reference.

ADD REPLY
0
Entering edit mode
3.5 years ago
d-cameron ★ 2.9k

What do you think that it gets shifted?

BAM stores alignments by reference index not by reference name.

Your example reads is aligned to the second contig. Initially this is chr1. After reheadering the second contig is 2.

BAM reheader does not support reorder or deletion of contigs

The work-around is to convert to SAM, reheader, then convert to BAM again.

Note that your approach does not fix SAM tags such as the SA tag and your output file will contain invalid chimeric alignments. Probably ok if you're doing SNV or CNV calling, very very bad if you're going to do any SV calling.

ADD COMMENT
0
Entering edit mode

Thank you very much! My goal is to make a match in chromosome notations of my BAM file and the reference (Unfortunately, I can not use another reference genome at the moment) BAM file header format:

@HD     VN:1  SO:coordinate
@SQ     SN:chrM LN:1234
@SQ     SN:chr1 LN:1234
@SQ     SN:chr2 LN:1234
@SQ     SN:chr3 LN:1234
....
...
... 

Reference header format:

>1 dna:chromosome chromosome:GRCh37:1:1:1234:1
>2 dna:chromosome chromosome:GRCh37:2:1:1234:1
>3 dna:chromosome chromosome:GRCh37:3:1:1234:1
>4 dna:chromosome chromosome:GRCh37:4:1:1234:1
...
..

Then, I will try converting to SAM, reheader, then convert to BAM again and filter the non-standard chromosomes with:

samtools view -L subset.bed ....
ADD REPLY
0
Entering edit mode

What's your goal? If you're not going to realign then you're best off doing your variant calling against the same reference as you aligned to then translating the chromosome names downstream. It's easier to rename chromosome names in (or downstream of) VCF than it is in SAM/BAM/CRAM. For example, in R adding/removing the chr prefix from chromosome names is as simple as:

seqLevelStyles(my_granges) = "UCSC"

I've found this very useful as it allows the use of resources in both notations

ADD REPLY

Login before adding your answer.

Traffic: 1139 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6