Entering edit mode
10.3 years ago
Dataminer
★
2.8k
Hi,
I have my bam file in this order:
samtools view -H my.bam
@CO Duplicates marked (bamUtil Version: 1.0.2)
@SQ SN:chr1 LN:249250621 AS:hg19
@SQ SN:chr10 LN:135534747 AS:hg19
@SQ SN:chr11 LN:135006516 AS:hg19
@SQ SN:chr12 LN:133851895 AS:hg19
@SQ SN:chr13 LN:115169878 AS:hg19
@SQ SN:chr14 LN:107349540 AS:hg19
@SQ SN:chr15 LN:102531392 AS:hg19
@SQ SN:chr16 LN:90354753 AS:hg19
@SQ SN:chr17 LN:81195210 AS:hg19
@SQ SN:chr18 LN:78077248 AS:hg19
@SQ SN:chr19 LN:59128983 AS:hg19
@SQ SN:chr2 LN:243199373 AS:hg19
@SQ SN:chr20 LN:63025520 AS:hg19
@SQ SN:chr21 LN:48129895 AS:hg19
@SQ SN:chr22 LN:51304566 AS:hg19
@SQ SN:chr3 LN:198022430 AS:hg19
@SQ SN:chr4 LN:191154276 AS:hg19
@SQ SN:chr5 LN:180915260 AS:hg19
@SQ SN:chr6 LN:171115067 AS:hg19
@SQ SN:chr7 LN:159138663 AS:hg19
@SQ SN:chr8 LN:146364022 AS:hg19
@SQ SN:chr9 LN:141213431 AS:hg19
@SQ SN:chrM LN:16571 AS:hg19
@SQ SN:chrX LN:155270560 AS:hg19
@SQ SN:chrY LN:59373566 AS:hg19
@PG ID:bwa PN:bwa VN:0.6.1-r104
And I wanna order it as, this:
How can I sort it in this order and arrange it?
Thank you
wrong: reheader won't change the order of the reads in the bam file . The reference index of the reads will be wrongly-assigned.
Please don't ever do that. You'll end up changing the chromosome to which all of your reads align. The reheader command should only ever be used to change the name (but never the order) of contigs/chromosomes. Picard handles this better (see Pierre's answer).
Thank you for your comment. I had a bam file (downloaded from the ENCODE) sequenced from a female, and it does not have chrY in the header. one of the program I used complained about missing the chrY info in the header, and I have to manually add it by samtools reheader. Does it affect the alignment?
As long as you appended chrY at the end then no, it didn't. If you put it anywhere else, then yes, the alignments were affected.
I read this discussion with interest and wished you could clarify a point. Say that you generate an unsorted SAM file. When you invoke picard/samtools to sort by coordinate, the SQ header lines will be used to set how you reorder your reads. If you want a different order, it would make sense to me to change the order of the SQ lines as long as that is done before invoking picard/samtools to reorder. Every record in a SAM/BAM file includes the name of the chromosome, so I don't see how going through the workflow above may mess things up. Indexing the BAM file of course would be performed after reordering. So reheading > sorting > indexing. By contrast, looks to me that picard/samtools won't help reordering reads as you wanted if the order in the SQ header lines is not what you want. What am I missing?
Every record in SAM files contains the name of the chromosome, but that's not true in BAM files. In BAM files, every record contains a
tid
field, which is an index into a structure held in the header (equivalent to the @SQ lines in a SAM file). If you reheader a BAM file, then this structure is replaced, but the alignment records themselves and their tid fields are unchanged. So, if you swap the order of chromosomes with samtools reheader of a BAM file then you'll do an in-place alteration of the mappings. This is typically not what people want to do.Thanks Devon, very clear. In my workflow I rehead a SAM file (with a bash script) to sort SQ header lines as I want. The product is still a SAM file. I then use picard to sort by coordinate (which now forces my SQ header line order) the SAM file, convert to BAM, and index, all together. This seems not to produce any unwanted behavior as reheading is done at the SAM stage, correct? I can consider also less elaborate ways to enforce the sorting I want, but I looked into samtools/picard and all they offer is sorting by read name or by coordinate (that is, the SQ header line).
Replacing the header of SAM files should always be a safe operation (unless you drop a chromosome, which will either make alignments unmapped or cause an error...I don't recall which).