Hi,
After reading http://samtools.sourceforge.net/SAM1.pdf , I have learnt that BAI files are indexes to BAM files. Also understood the bgzf and virtual file offset concept.
Example BAM file printed :
@SQ SN:CHR21 LN:1000
@SQ SN:CHR22 LN:2000
read100 16 CHR21 33028084 255 50M * 0 0 ATTTAAAAATTAATTTAATGCTTGGCTAAATCTTAATTACATATATAATT <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< NM:i:0
read101 16 CHR21 33028087 255 50M * 0 0 TAAAAATTAATTTAATGCTTGGCTAAATCTTAATTACATATATAATTATC <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< NM:i:0
...many CHR21 segments
read200 16 CHR22 33028084 255 50M * 0 0 ATTTAAAAATTAATTTAATGCTTGGCTAAATCTTAATTACATATATAATT <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< NM:i:0
read201 16 CHR22 33028084 255 50M * 0 0 ATTTAAAAATTAATTTAATGCTTGGCTAAATCTTAATTACATATATAATT <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< NM:i:0
.. many CHR22 segments.
I am having trouble understanding what the entries in BAI files point to. The indexes in BAI file point to what? Can somebody please explain with respect to a specific file (or use the example above)? What does the first index point to? What does the second index point to? and so on.
If there is a document that I should read, please point me to it.
Thanks in advance.
The "specific parts" are the parts that the user is asking for. For example, you ask for all reads on chr22 between positions 1000000 and 2000000. The
bai
file will tell you (roughly) where (think "byte offsets) inside the bam file these reads will be found, instead of "you" having to start from the beginning of the bam file and parse every read until you get to chr22 position 1 million. As for what an index is: ask yourself what the purpose of a book index is and you're close. A "database index" is also close.No, it doesnt. It says that bai "allows programs to jump directly to specific parts of the bam file without reading through all of the sequences" . But to which parts? What is an index?