A BAM index (BAI) file lets you map loci on the genome to a range of byte offsets in a BAM file. It's essential for browsing large pileups in interactive visualizations like IGV and BioDalliance.
But at some point, the BAM file grows so large that its corresponding BAI file also gets unwieldy. For example, I have an 80GB BAM file with a 9MB BAI file. Loading this file over even a relatively fast network takes many seconds, far longer than users of modern web pages are accustomed waiting.
One solution to this problem would be to only load portions of the BAI file. For example, if I'm looking at chr20, there's no need to download the portions of the BAI file that deal with the other chromosomes. The BAI format doesn't lend itself well to random seeking, however, so this would require some kind of index.
Is there a standard way to index a BAM Index file?
I wound up implementing a BAI indexer in Python (bai-indexer) and added support for this to BioDalliance.
There's not, though that's probably not a bad idea. You might propose something on the samtools devel email list.
Just making the observation that when we need to index the index file something is evolving the wrong way
like the meme goes ... I've indexed your index file so you can be indexing while indexing ...
Perhaps the answer is the CRAM file format that was added to samtools:
CRAM goes mainline