Samtools/Picard Fixmate Not Working On A Bam With Multiple Alignments Per Read?
1
3
Entering edit mode
11.7 years ago
scottsmith1 ▴ 180

I have a BAM file with with multiple alignments per-read (NH:i:2) of cDNA sequence. It seems like there is a bug in how "samtools sort" and "samtools fixmate" work together in this case …or perhaps I am using these incorrectly?
I see the same thing with the equivalent Picard tools.

Here are my four rows from "samtools view", representing one read pair with two hits per read.

  1. HWI-ST495_132993521:3:1101:1143:51868 385 16 72090091 3 52M286N48M = 72090091 0 AGCT qqqq RG:Z:2890686892 PG:Z:2890686892 AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:100 YT:Z:UU XS:A:+ NH:i:2 CC:Z:= CP:i:72090091 HI:i:0
  2. HWI-ST495_132993521:3:1101:1143:51868 129 16 72090091 3 52M2010N48M = 72090091 0 AGCT qqqq RG:Z:2890686892 PG:Z:2890686892 AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:100 YT:Z:UU XS:A:+ NH:i:2 HI:i:1
  3. HWI-ST495_132993521:3:1101:1143:51868 369 16 72094060 3 100M = 72094060 0 RG:Z:2890686892 AGCT qqqq PG:Z:2890686892 AS:i:-1 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:96G3 YT:Z:UU XS:A:+ NH:i:2 CC:Z:= CP:i:72094060 HI:i:0
  4. HWI-ST495_132993521:3:1101:1143:51868 113 16 72094060 3 100M = 72094060 0 RG:Z:2890686892 AGCT qqqq PG:Z:2890686892 AS:i:-1 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:96G3 YT:Z:UU XS:A:+ NH:i:2 HI:i:1

(The sequence and quality in columns swapped for "AGCT" and "qqqq" because the data is protected, and for readability):

Effectively:

  1. read 2, hit-index 0
  2. read 2, hit-index 1
  3. read 1, hit-index 0
  4. read 1, hit-index 1

Fixmate has set columns 7 and 8 to the position of the mate for each alignment. It has rows 1 & 2 as a pair, and 3 & 4 as a pair, instead of 1&3 and 2&4.

Looking at the sort code, when name sorting, it sorts by name, then position. So if the top two alignments are different cigar strings at the same position, pairs will not be adjacent.

Looking at the fixmate code, it seems to presume pairs will be adjacent in a "name-sorted" BAM.

It seems that for sort and fixmate to work together properly, either the name-sort option must do further work to ensure pairs are adjacent when NH > 1, such as sort by name then HI (hit index), then position instead of name-then-position. Or that fixmate must be a little smarter and have a window of more than two alignments.

Have others encountered this? Am I missing something standard done when using fixmate with a BAM with multiple alignments?

Thanks in advance! Scott

deleted-post • 6.4k views
ADD COMMENT
0
Entering edit mode

I undeleted this post. Did you delete it because you answered your own question? Unless you found it was a duplicate it seems like a good one to keep.

ADD REPLY
1
Entering edit mode

I realized that, when emitting multiple hits per read, the secondary alignments were not necessarily paired at all. Some of the premise in that question is flawed.

It would have been better to annotate it as such though. And given Alec's response below it was not as flawed as I originally thought. Thanks for undeleting it...

ADD REPLY
3
Entering edit mode
11.7 years ago
scottsmith1 ▴ 180

I may have answered my own question: fixmate was perhaps not intended to run on BAMs with multiple alignments per-read.

I'm using HTSeq to count hits against transcripts, and this tool emits errors if the fixmate data (columns 7 & 8) are present but inconsistent. If fixmate had not been run at all, it wouldn't have put inconsistent data in the BAM. It would possibly be ideal for fixmate to either ignore cases where the mate has multiple mappings, or point to the primary alignment for the mate. Because it doesn't, the only way to keep it from doing the wrong thing is to not try to use for BAMs which include non-primary alignments.

ADD COMMENT
1
Entering edit mode

This was confirmed by Alec Wysoker when I posted the same question to the samtools mailing list.

On Mar 4, 2013, at 10:01 AM, Alec Wysoker wrote:

Hi Scott,

I'm afraid FixMateInformation isn't going to help you, at least not as much as you probably would like. It doesn't handle secondary alignments. It used to not pay any attention to the secondary alignment flag, but as of the most recently release, it will skip over the secondary alignments. So, if you run the latest version it should properly fix the primary alignments, and the secondaries will get written to the output unchanged.

There is no reason in principle that it couldn't do the right thing, given that your secondary alignments are properly correlated with HI, but there hasn't been much demand for this, and this isn't a program we use ourselves very much because we make sure the mate info is correct from the start.

-Alec

ADD REPLY

Login before adding your answer.

Traffic: 2789 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6