Reordering the bam file
2
2
Entering edit mode
10.2 years ago
Dataminer ★ 2.8k

Hi,

I have my bam file in this order:

samtools view -H my.bam

@CO     Duplicates marked (bamUtil Version: 1.0.2)
@SQ     SN:chr1 LN:249250621    AS:hg19
@SQ     SN:chr10        LN:135534747    AS:hg19
@SQ     SN:chr11        LN:135006516    AS:hg19
@SQ     SN:chr12        LN:133851895    AS:hg19
@SQ     SN:chr13        LN:115169878    AS:hg19
@SQ     SN:chr14        LN:107349540    AS:hg19
@SQ     SN:chr15        LN:102531392    AS:hg19
@SQ     SN:chr16        LN:90354753     AS:hg19
@SQ     SN:chr17        LN:81195210     AS:hg19
@SQ     SN:chr18        LN:78077248     AS:hg19
@SQ     SN:chr19        LN:59128983     AS:hg19
@SQ     SN:chr2 LN:243199373    AS:hg19
@SQ     SN:chr20        LN:63025520     AS:hg19
@SQ     SN:chr21        LN:48129895     AS:hg19
@SQ     SN:chr22        LN:51304566     AS:hg19
@SQ     SN:chr3 LN:198022430    AS:hg19
@SQ     SN:chr4 LN:191154276    AS:hg19
@SQ     SN:chr5 LN:180915260    AS:hg19
@SQ     SN:chr6 LN:171115067    AS:hg19
@SQ     SN:chr7 LN:159138663    AS:hg19
@SQ     SN:chr8 LN:146364022    AS:hg19
@SQ     SN:chr9 LN:141213431    AS:hg19
@SQ     SN:chrM LN:16571        AS:hg19
@SQ     SN:chrX LN:155270560    AS:hg19
@SQ     SN:chrY LN:59373566     AS:hg19
@PG     ID:bwa  PN:bwa  VN:0.6.1-r104

And I wanna order it as, this:

How can I sort it in this order and arrange it?

Thank you

ChIP-Seq Samtools Bamtools • 5.8k views
ADD COMMENT
5
Entering edit mode
10.2 years ago

use picard ReorderSam

ReorderSam reorders reads in a SAM/BAM file to match the contig ordering in a provided reference file, as determined by exact name matching of contigs

ADD COMMENT
0
Entering edit mode
10.2 years ago
Ming Tommy Tang ★ 4.5k

you can use samtools reheader to do it.

for example :

edit a bam header(insert a line)

samtools view -H my.bam | sed '24 a @SQ SN:chrY LN:59373566' | samtools reheader - my.bam > my.reheaded.bam
ADD COMMENT
2
Entering edit mode

wrong: reheader won't change the order of the reads in the bam file . The reference index of the reads will be wrongly-assigned.

ADD REPLY
1
Entering edit mode

Please don't ever do that. You'll end up changing the chromosome to which all of your reads align. The reheader command should only ever be used to change the name (but never the order) of contigs/chromosomes. Picard handles this better (see Pierre's answer).

ADD REPLY
0
Entering edit mode

Thank you for your comment. I had a bam file (downloaded from the ENCODE) sequenced from a female, and it does not have chrY in the header. one of the program I used complained about missing the chrY info in the header, and I have to manually add it by samtools reheader. Does it affect the alignment?

ADD REPLY
1
Entering edit mode

As long as you appended chrY at the end then no, it didn't. If you put it anywhere else, then yes, the alignments were affected.

ADD REPLY
0
Entering edit mode

I read this discussion with interest and wished you could clarify a point. Say that you generate an unsorted SAM file. When you invoke picard/samtools to sort by coordinate, the SQ header lines will be used to set how you reorder your reads. If you want a different order, it would make sense to me to change the order of the SQ lines as long as that is done before invoking picard/samtools to reorder. Every record in a SAM/BAM file includes the name of the chromosome, so I don't see how going through the workflow above may mess things up. Indexing the BAM file of course would be performed after reordering. So reheading > sorting > indexing. By contrast, looks to me that picard/samtools won't help reordering reads as you wanted if the order in the SQ header lines is not what you want. What am I missing?

ADD REPLY
2
Entering edit mode

Every record in SAM files contains the name of the chromosome, but that's not true in BAM files. In BAM files, every record contains a tid field, which is an index into a structure held in the header (equivalent to the @SQ lines in a SAM file). If you reheader a BAM file, then this structure is replaced, but the alignment records themselves and their tid fields are unchanged. So, if you swap the order of chromosomes with samtools reheader of a BAM file then you'll do an in-place alteration of the mappings. This is typically not what people want to do.

ADD REPLY
0
Entering edit mode

Thanks Devon, very clear. In my workflow I rehead a SAM file (with a bash script) to sort SQ header lines as I want. The product is still a SAM file. I then use picard to sort by coordinate (which now forces my SQ header line order) the SAM file, convert to BAM, and index, all together. This seems not to produce any unwanted behavior as reheading is done at the SAM stage, correct? I can consider also less elaborate ways to enforce the sorting I want, but I looked into samtools/picard and all they offer is sorting by read name or by coordinate (that is, the SQ header line).

ADD REPLY
1
Entering edit mode

Replacing the header of SAM files should always be a safe operation (unless you drop a chromosome, which will either make alignments unmapped or cause an error...I don't recall which).

ADD REPLY

Login before adding your answer.

Traffic: 2804 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6