Can the chromosome names and lengths to which a bam file was aligned be determined solely from its bai file?
Can the chromosome names and lengths to which a bam file was aligned be determined solely from its bai file?
No. The BAI format is described in §5.2 of SAMv1.pdf and does not contain the chromosome names or lengths. Instead it merely identifies reference sequences by their index, 0 <= n < n_ref, in the corresponding BAM file's (binary) header.
The CSI format operates similarly.
The Tabix format OTOH does contain the names of reference sequences (but not their lengths).
John Marshall is one of the SAMtools maintainers so we can accept this answer as the "correct" one.
I am at present one of the editors of the SAM specification and a former SAMtools maintainer. But in fact this answer, like any other, should be accepted as correct or not based on its own merits, not its author — I included links to the various specifications precisely so that the claims made can be verified, and as references for further detail.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
The header of the bam contains that information.
That is not what OP is asking though.
Based on the SAM specs section 5.2 these information should (I guess) be in the index but I do not have a strategy ready to extract them. https://samtools.github.io/hts-specs/SAMv1.pdfok see answer below.As how to do it in practice, this thread may be relevant Make bam index human readable
I found the make bam index human readable thread to be very useful for exploring BAI files, especially Pierre Lindenbaum's tool for dumping the BAI as XML. In looking at the output of Pierre's tool, it does include the names and lengths of the chromosomes, but they are pulled from the bam header, not the BAI file. This fits with John Marshall's answer below.