A Quick Way To View Or Check The Order Of Chromosome In Bam File
3
2
Entering edit mode
11.1 years ago
Tonyzeng ▴ 310

HI I have question here, if I just need to check or view the order of chromosome in BAM file after aligment with BWA, what tools can do this and how?

I used $ samtools tview sample.bam and could not help me find this information of chromosome order easily because " Within the tview of samtools, you can only jump to a new location by typing g: and a location, like g:chr1:10,000,000." from Wikipedia

bam samtools • 14k views
ADD COMMENT
0
Entering edit mode

I cannot check it now, but is samtools idxstats aln.sorted.bam doing what you want?

ADD REPLY
9
Entering edit mode
11.1 years ago

The first thing to check is that the file is actually coordinate sorted. This can be done with samtools view -H sample.bam | grep SO, which should produce something like @HD VN:1.3 SO:coordinate if the file is in fact sorted. If it is already sorted, then the ordering of the reads and the ordering of the @SQ lines in the header should match. So, simply samtools view -H sample.bam | grep SQ | cut -f 2 | awk '{ sub(/^SN:/, ""); print;}' to quickly get the ordering without needing to go through the entire file.

ADD COMMENT
0
Entering edit mode

Thank you dpryan, that is so helpful!

ADD REPLY
0
Entering edit mode

Dpryan, Thank you! Could I ask one more question? what is the best way to check the chromosome order of reference file with .fa format and snp variant file with .vcf format?

ADD REPLY
1
Entering edit mode

Hmm, there's no super quick equivalent that I know of. For the order of a fasta file, you can either grep ">" reference.fa or, if the fasta file was already indexed with samtools faidx, then just cut -f 1 reference.fa.fai. To get the order of the SNPs, you have to process the whole file unless you've already tabix indexed it (this is part of many SNP calling pipelines). If you've already tabix indexed the file, then I think tabix -l snps.vcf.bgz will work. Otherwise, grep -v "^#" snps.vcf | cut -f 1 | uniq should work.

ADD REPLY
0
Entering edit mode

Thank you dpryan79, I am still having a question. Now I do know what is the "Chr" order of the header of BAM file but how about the real "Chr" order of the reads of BAM file? I ask this question because I found the my original SAM file (my BAM file was converted by this SAM file) has the different "Chr" order on the header and real reads. So I doubt that my BAM may has different chr order on the header and reads.

ADD REPLY
0
Entering edit mode

If the SAM file wasn't explicitly sorted, then they could be in any order. SAM and BAM files can either be coordinate-sorted, name-sorted, or unsorted. Generally things are unsorted when they come directly from an aligner (except tophat, which has an optionally disableable sorting step).

ADD REPLY
0
Entering edit mode

May I ask what the difference between three order ways?

ADD REPLY
0
Entering edit mode

Two are sorted in various ways, one is just whatever random order you get.

ADD REPLY
0
Entering edit mode

Be sure to sort before using uniq, or you may get spurious results.

ADD REPLY
0
Entering edit mode

The premise is that the files are already sorted, I think. If not, your perl solution below would be best in most cases.

ADD REPLY
0
Entering edit mode

For VCF, using BEDOPS:

$ vcf2bed < foo.vcf > foo.bed && bedextract --list-chr foo.bed

Documentation: vcf2bed and bedextract.

ADD REPLY
2
Entering edit mode
11.1 years ago

Duplicate of is my BAM file sorted ?

ADD COMMENT
1
Entering edit mode
11.1 years ago

Assuming that chromosomes aren't interleaved, you could use a hash table lookup to preserve the discovered order, e.g., via Perl:

$ samtools view reads.bam | cut -f3 | perl -ne 'print unless $seen{$_}++' -
chr15
chr7
chr16
chr11
chr12
chr1
chr5
...
ADD COMMENT

Login before adding your answer.

Traffic: 1918 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6