Comparing Two Bam Files Obtained From The Same Files And The Exact Settings
1
2
Entering edit mode
10.8 years ago
roll ▴ 350

Hi, I just run tophat2 with bowtie1 on the same fastq files with the exact same settings. Then i am comparing the two bam files. I used cmp command in unix and it keeps telling me that

bam1 bam2 differ: byte 27, line 1

When I look at the first line, i do not see any difference at all.

  1. Is it expected to have this difference between two bam files even when obtained with the same exact settings?
  2. How can i actually see what is the different bit between the two bam files?
bam tophat2 • 7.8k views
ADD COMMENT
1
Entering edit mode

Convert the files to SAM (or uncompressed BAM), there are simply too many caveats when thinking about byte by byte comparisons of compressed files.

ADD REPLY
0
Entering edit mode

cool, i just did that and this time i found the bitwise flags to be different. (one is 417 and the other is 161). Why there is a difference do you think?

ADD REPLY
1
Entering edit mode

That's a multimapping read with (presumably) multiple reported alignments. If all of the alignments are equal, then one can just be randomly chosen as being primary. Presumably, then, this is just a difference in the alignment assigned as primary.

ADD REPLY
0
Entering edit mode

That explains all. Thanks for pointing that

ADD REPLY
0
Entering edit mode

Is there a way of setting a seed on this??

ADD REPLY
1
Entering edit mode

No, bowtie2 has deterministic output by default. Every aligner will be different here. If you need the exact same results for CI testing then just use a small file without multimappers.

ADD REPLY
2
Entering edit mode
10.8 years ago
  • you cannot just use cmp: if the SAM header contains something like a date, a timestamp, your whole files will be different.

I wrote a tool to compare two BAMS: https://github.com/lindenb/jvarkit/wiki/CmpBams

example:

$ java -jar dist/cmpbams.jar -F -C tmp1.bam tmp2.bam tmp3.bam

#READ-Name  tmp1.bam tmp2.bam|tmp1.bam tmp3.bam|tmp2.bam tmp3.bam   tmp1.bam    tmp2.bam    tmp3.bam
HWI-1KL149:20:C1CU7ACXX:1:1101:17626:32431/1    EQ|EQ|EQ    K01:2136=83/43M3I54M    K01:2136=83/43M3I54M    K01:2136=83/43M3I54M
HWI-1KL149:20:C1CU7ACXX:1:1101:17626:32431/2    EQ|EQ|EQ    K01:2059=163/100M   K01:2059=163/100M   K01:2059=163/100M
HWI-1KL149:20:C1CU7ACXX:1:1102:16831:71728/1    EQ|EQ|EQ    K01:2133=83/100M    K01:2133=83/100M    K01:2133=83/100M
HWI-1KL149:20:C1CU7ACXX:1:1102:16831:71728/2    EQ|EQ|EQ    K01:2059=163/100M   K01:2059=163/100M   K01:2059=163/100M
HWI-1KL149:20:C1CU7ACXX:1:1105:3309:27760/1 EQ|EQ|EQ    K01:2213=83/100M    K01:2213=83/100M    K01:2213=83/100M
HWI-1KL149:20:C1CU7ACXX:1:1105:3309:27760/2 EQ|EQ|EQ    K01:2081=163/100M   K01:2081=163/100M   K01:2081=163/100M
HWI-1KL149:20:C1CU7ACXX:1:1106:2914:12111/1 EQ|EQ|EQ    K01:2136=83/43M3I54M    K01:2136=83/43M3I54M    K01:2136=83/43M3I54M
HWI-1KL149:20:C1CU7ACXX:1:1106:2914:12111/2 EQ|EQ|EQ    K01:2059=163/100M   K01:2059=163/100M   K01:2059=163/100M
HWI-1KL149:20:C1CU7ACXX:1:1107:11589:17295/1    EQ|EQ|EQ    K01:2123=83/56M3I41M    K01:2123=83/56M3I41M    K01:2123=83/56M3I41M
HWI-1KL149:20:C1CU7ACXX:1:1107:11589:17295/2    EQ|EQ|EQ    K01:1990=163/100M   K01:1990=163/100M   K01:1990=163/100M
HWI-1KL149:20:C1CU7ACXX:1:1110:14096:95943/1    EQ|EQ|EQ    K01:2123=83/56M3I41M    K01:2123=83/56M3I41M    K01:2123=83/56M3I41M
HWI-1KL149:20:C1CU7ACXX:1:1110:14096:95943/2    EQ|EQ|EQ    K01:1990=163/100M   K01:1990=163/100M   K01:1990=163/100M
HWI-1KL149:20:C1CU7ACXX:1:1110:15369:59046/1    EQ|EQ|EQ    K01:2213=83/100M    K01:2213=83/100M    K01:2213=83/100M
HWI-1KL149:20:C1CU7ACXX:1:1110:15369:59046/2    EQ|EQ|EQ    K01:2081=163/100M   K01:2081=163/100M   K01:2081=163/100M
HWI-1KL149:20:C1CU7ACXX:1:1111:8599:97362/1 EQ|EQ|EQ    K01:2213=83/100M    K01:2213=83/100M    K01:2213=83/100M
HWI-1KL149:20:C1CU7ACXX:1:1111:8599:97362/2 EQ|EQ|EQ    K01:2081=163/100M   K01:2081=163/100M   K01:2081=163/100M
HWI-1KL149:20:C1CU7ACXX:1:1113:10490:30873/1    EQ|EQ|EQ    K01:2120=83/100M    K01:2120=83/100M    K01:2120=83/100M
HWI-1KL149:20:C1CU7ACXX:1:1113:10490:30873/2    EQ|EQ|EQ    K01:1990=163/100M   K01:1990=163/100M   K01:1990=163/100M
HWI-1KL149:20:C1CU7ACXX:1:1113:12360:36316/1    EQ|EQ|EQ    K01:2213=83/100M    K01:2213=83/100M    K01:2213=83/100M
HWI-1KL149:20:C1CU7ACXX:1:1113:12360:36316/2    EQ|EQ|EQ    K01:2081=163/100M   K01:2081=163/100M   K01:2081=163/100M

(...)

ADD COMMENT
0
Entering edit mode

this looks like a nice tool. I will try it.

ADD REPLY
0
Entering edit mode

Jvarkit is the mightiest. Hi Pierre. Do you think it's possible for CmpBams to give leftmost coordinates difference for certain read as well as CIGAR in two different alignment? I am trying to map clipped reads to find some circle split reads.

ADD REPLY

Login before adding your answer.

Traffic: 2006 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6