I made a website to help visualize the BAI file format and specifically the binning index. This is a pretty niche tool that doesn't have much actual bioinformatic value besides helping to understand the BAI file structure. I made it because I felt like I never had a 100% firm grasp on BAI indexing and so I thought I'd make this visualizer to help.
https://cmdcolin.github.io/bam_index_visualizer/
Summary
The bam index (BAI) allows users to download only the data that is needed for a particular query e.g. chr1:1-100 from the BAM file. The BAI is read into memory in full and contains bins, which themselves contain one of more "start and end" pointers to where in the BAM file to look for the reads for your query. This program will show you what this bin structure looks like in a given BAI file.
The first chart below shows the 512Mbp overview. This is because the bins for the BAI cannot address chromosomes larger than 512Mb, and so this graph shows this "total overview". Bins are colored by how much data are them scaled against the largest bin. You can also click and drag the grey bar above the view to "zoom in" or side scroll the canvas.
The second chart shows an overview of the byte ranges that would be requested from the BAM, and it is responsive to zooming in and out on the first chart.
The third chart/table is the actual textual representation of which bins are being requested, and if you click the button, it will actually go and fetch the data from the BAM file, which will also demonstrate the short-circuiting action because not all the bins have to be requested: the program can stop once it finds a read in the BAM file that is beyond the genomic coordinate range being requested (also responsive to zooming in on the first chart)
Possible further work:
- Make a CRAM index (CRAI) visualizer
- Add CSI index support (this is not super interesting, just a modified BAI that let's you customize essentially the bin sizes to index genomes with chromosomes larger than 512Mbp)
- Show the linear index as well (not present in CSI, but is in BAI, popularly used by indexcov)