As mentioned in other answers, BAM only fulfills part of your requirement (compression and random access), but not indexing. However, you can easily roll your own index using the BGZF API and your key-value store of choice e.g. Berkeley DB.
Here's an example using my cl-sam API, but you can substitute e.g. the C API that comes with Samtools or the Java API from Picard (or a Swig wrapper). I'm just writing data to a text file, instead of BDB, but you get the idea...
(defun bamdex (bam-file index-file)
(with-open-file (index index-file :direction :output)
(with-bgzf (bgzf bam-file :direction :input)
(read-bam-meta bgzf)
(loop
for offset = (bgzf-tell bgzf)
for record = (read-alignment bgzf)
while record
do (format index "~s ~d~%" (read-name record) offset)))))
Keys and values:
"ENSDART00000000005_480_670_12d" 2269928771
The key is the read name, the number is the virtual offset into the uncompressed data (see the SAM spec). Use with a BGZF seek to reach the record:
(with-bgzf (bgzf "test.bam")
(bgzf-seek bgzf 2269928771)
(read-name (read-alignment bgzf)))
gives
"ENSDART00000000005_480_670_12d"
If the reads are very long, you might consider using a sequence checksum as a key instead of the actual sequence.
Good info. I know samtools has a "TO DO" list, but I wonder if anyone has already done an implementation of an indexed BAM file...
Might be added to Biopython shortly, see here