Extract all BAM reads that intersect a given region using the BAI index
1
0
Entering edit mode
4.9 years ago
bugmaster • 0

Let's say I have a BAI index, and a BAM file. I want to extract all the reads that intersect a given region on a chromosome, while reading as little data from the BAM file as possible. How can I do this ?

At first, I naively thought that I could use the linear index to simply find all the linear bins that intersect my region. But this does not appear to be enough, since there could be long reads that start before the start of my region, and end after it. Such reads intersect my region, but I won't find them using the naive approach.

I thought of using the R-Tree index as well, but it looks like I'd need to essentially read the entire BAM file in order to find all the intersecting reads -- since there's no guarantee that bin 0 (the one that spans the entire BAM file) doesn't contain the one long read that intersects my region.

Is there a better approach, or am I missing something ? I apologize for asking such a primitive question, but I'm new to BAM files, and not yet familiar with the accepted practices.

bam • 1.4k views
ADD COMMENT
0
Entering edit mode

Not an answer as I haven't implemented this myself so don't know all the ins and outs.

However basically the BAI index maps ranges (produced as "bins") to file offsets. Given the R-Tree isn't binary, a single bin may have multiple start/stop points for data within it, hence the linear index too.

There is code in htslib's hts.c to do this, but it's nightmarishly complicated due to a myriad of closely related index formats. Perhaps something like IGV's code may be easier to understand. It's certainly much shorter so may be a good learning code.

https://github.com/igvteam/igv.js/blob/master/js/bam/bamIndex.js

ADD REPLY
1
Entering edit mode
4.9 years ago

Why can't you use samtools?

ADD COMMENT
0
Entering edit mode

I suppose I could, but I want to actually understand how the BAM/BAI files are structured, so that I can ingest and parse them on my own.

ADD REPLY
0
Entering edit mode

Why would you want to write your own parser if everything you need already exists?

ADD REPLY
0
Entering edit mode

It's a better use of your time to not reinvent the wheel, but to learn how to use the tools that everyone else already uses and understands. It also makes it easier for people to replicate what you did if you use common tools.

ADD REPLY

Login before adding your answer.

Traffic: 2197 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6