Hi,
I have this SAM file which has two MT entries (bolded) as follows;
@SQ SN:MT LN:16569
@SQ SN:1 LN:249250621
@SQ SN:10 LN:135534747
@SQ SN:11 LN:135006516
@SQ SN:12 LN:133851895
@SQ SN:13 LN:115169878
@SQ SN:14 LN:107349540
@SQ SN:15 LN:102531392
@SQ SN:16 LN:90354753
@SQ SN:17 LN:81195210
@SQ SN:18 LN:78077248
@SQ SN:19 LN:59128983
@SQ SN:2 LN:243199373
@SQ SN:20 LN:63025520
@SQ SN:21 LN:48129895
@SQ SN:22 LN:51304566
@SQ SN:3 LN:198022430
@SQ SN:4 LN:191154276
@SQ SN:5 LN:180915260
@SQ SN:6 LN:171115067
@SQ SN:7 LN:159138663
@SQ SN:8 LN:146364022
@SQ SN:9 LN:141213431
@SQ SN:MT LN:16569
@SQ SN:X LN:155270560
@SQ SN:Y LN:59373566
I got rid of the first MT entry since after BAM conversion, the first MT entry gives 0 mapping;
MT 16569 0 0
1 249250621 469558 0
10 135534747 241345 0
11 135006516 280732 0
12 133851895 256092 0
13 115169878 153564 0
14 107349540 168501 0
15 102531392 160025 0
16 90354753 154228 0
17 81195210 174856 0
18 78077248 118007 0
19 59128983 177415 0
2 243199373 399350 0
20 63025520 151498 0
21 48129895 93210 0
22 51304566 67376 0
3 198022430 332309 0
4 191154276 271651 0
5 180915260 295771 0
6 171115067 299632 0
7 159138663 259500 0
8 146364022 235990 0
9 141213431 230866 0
MT 16569 1608344 0
X 155270560 226806 0
Y 59373566 11557 0
So I got rid of the first MT entry by extracting a header by
samtools view -HS input.sam > header.sam
and deleted it by using an text editor.
Then
samtools reheader header.sam input.bam > new.bam
I made sure the first MT entry was gone in the header;
SQ SN:1 LN:249250621
@SQ SN:10 LN:135534747
@SQ SN:11 LN:135006516
@SQ SN:12 LN:133851895
@SQ SN:13 LN:115169878
@SQ SN:14 LN:107349540
@SQ SN:15 LN:102531392
@SQ SN:16 LN:90354753
@SQ SN:17 LN:81195210
@SQ SN:18 LN:78077248
@SQ SN:19 LN:59128983
@SQ SN:2 LN:243199373
@SQ SN:20 LN:63025520
@SQ SN:21 LN:48129895
@SQ SN:22 LN:51304566
@SQ SN:3 LN:198022430
@SQ SN:4 LN:191154276
@SQ SN:5 LN:180915260
@SQ SN:6 LN:171115067
@SQ SN:7 LN:159138663
@SQ SN:8 LN:146364022
@SQ SN:9 LN:141213431
@SQ SN:MT LN:16569
@SQ SN:X LN:155270560
@SQ SN:Y LN:59373566
Ok Looking great but then when I further processed the new.BAM file by;
samtools sort new.bam
samtools index new.bam
Then I got an error message;
51501 segmentation fault samtools index Rsorted.bam
The samtools flagstat output looked fine;
7837322 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
7318383 + 0 mapped (93.38%:nan%)
7837322 + 0 paired in sequencing
3918661 + 0 read1
3918661 + 0 read2
6190627 + 0 properly paired (78.99%:nan%)
7119872 + 0 with itself and mate mapped
198511 + 0 singletons (2.53%:nan%)
89782 + 0 with mate mapped to a different chr
89781 + 0 with mate mapped to a different chr (mapQ>=5)
I am using samtools0.1.19-44428cd
I do really appreciate any help on deleting the extra MT entry in the sam header without having this seg-fault.
Thanks in advance.
Hi Devon,
Your script worked great !! It was my understanding from a samtools man page that reheader appeared to be used to manipulate sam/bam headers without messing read counts and also when I looked at some discussion on the web which recommended for the use of reheader.
Anyway thanks!!
The problem is that the chromosome that a read aligns to isn't stored in the read itself. Instead, an integer index (
tid
) is stored in the alignment and samtools (or any other program) looks in the header for that chromosome/contig associated with it. So, if you swapped the order of chr1 and chr2 in the header then all of the reads previously reported to be aligned to chr1 would then be aligned to chr2 (and vice versa). Similarly, by just replacing the header with one omitting a line, everything that previously aligned to the second listed chromosome (1 in your case) would then be aligned to what was previously the third chromosome (10 in your case). That's why there are usually warnings that the chromosomes/contigs should be in the same order when replacing a header.This, BTW, is the benefit to making the change in SAM, since then the chromosome is stored with the alignment.
So, what is a proper way(s) and instance(s) to use samtools reheader?
Thanks.
If you want to change UCSC to Ensembl chromosome names then it's a convenient way to do so. Also, if you have a BAM file with alignments but no header (this is normally a programming error) then this is also useful.
I should add that it's also handy if you want to add optional header fields, like the @PG or @CO lines. Otherwise, a better name for that command is probably "samtools renameContigs".