Question

Extract all BAM reads that intersect a given region using the BAI index

0

Entering edit mode

4.9 years ago

bugmaster • 0

Let's say I have a BAI index, and a BAM file. I want to extract all the reads that intersect a given region on a chromosome, while reading as little data from the BAM file as possible. How can I do this ?

At first, I naively thought that I could use the linear index to simply find all the linear bins that intersect my region. But this does not appear to be enough, since there could be long reads that start before the start of my region, and end after it. Such reads intersect my region, but I won't find them using the naive approach.

I thought of using the R-Tree index as well, but it looks like I'd need to essentially read the entire BAM file in order to find all the intersecting reads -- since there's no guarantee that bin 0 (the one that spans the entire BAM file) doesn't contain the one long read that intersects my region.

Is there a better approach, or am I missing something ? I apologize for asking such a primitive question, but I'm new to BAM files, and not yet familiar with the accepted practices.

bam • 1.4k views

ADD COMMENT • link updated 17 months ago by Ram 44k • written 4.9 years ago by bugmaster • 0

0

Entering edit mode

Not an answer as I haven't implemented this myself so don't know all the ins and outs.

However basically the BAI index maps ranges (produced as "bins") to file offsets. Given the R-Tree isn't binary, a single bin may have multiple start/stop points for data within it, hence the linear index too.

There is code in htslib's hts.c to do this, but it's nightmarishly complicated due to a myriad of closely related index formats. Perhaps something like IGV's code may be easier to understand. It's certainly much shorter so may be a good learning code.

https://github.com/igvteam/igv.js/blob/master/js/bam/bamIndex.js

ADD REPLY • link 4.9 years ago by jkbonfield ★ 1.3k

score 1 · Answer 1 · 2019-12-25

1

Entering edit mode

4.9 years ago

swbarnes2 14k

Why can't you use samtools?

ADD COMMENT • link 4.9 years ago by swbarnes2 14k

0

Entering edit mode

I suppose I could, but I want to actually understand how the BAM/BAI files are structured, so that I can ingest and parse them on my own.

ADD REPLY • link 4.9 years ago by bugmaster • 0

0

Entering edit mode

Why would you want to write your own parser if everything you need already exists?

ADD REPLY • link 4.9 years ago by WouterDeCoster 47k

0

Entering edit mode

It's a better use of your time to not reinvent the wheel, but to learn how to use the tools that everyone else already uses and understands. It also makes it easier for people to replicate what you did if you use common tools.

ADD REPLY • link 4.9 years ago by swbarnes2 14k