The UCSC binning scheme was suggested by Richard Durbin and Lincoln Stein and is explained by Kent et al. (2002). In this scheme, each bin represents a contiguous genomic region which can be fully contained in another bin; each alignment is associated with a bin which represents the smallest region containing the entire alignment. The binning scheme is essentially another representation of R-tree. A distinct bin uniquely corresponds to a distinct internal node in a R-tree. Bin A is a child of Bin B if region A is contained in B.
In BAM, each bin may span 2^29, 2^26, 2^23, 2^20, 2^17 or 2^14 bp. Bin 0 spans a 512Mbp region, bins 1-8 span 64Mbp, 9-72 8Mbp, 73-584 1Mbp, 585-4680 128Kbp and bins 4681-37449 span 16Kbp regions. If we want to find the alignments overlapped with a region [rbeg,rend), we need to calculate the list of bins that may be overlapped the region and test the alignments in the bins to confirm the overlaps. If the specified region is short, typically only a few alignments in six bins need to be retrieved. The overlapping alignments can be quickly fetched.
Hi Pierre,
I'm facing the 'bin' field in UCSC table.
I've read your blog http://plindenbaum.blogspot.it/2010/05/binning-genome.html about that.
I usually work with Perl and I'm not familiar at all with Java...so is quite impossible to translate your java code.
The only articles that explain how to manage this field give few details.
Do you know a Perl script that does the same thing?
Otherwise can you suggest me a more detailed article?
Thanks in advance
another resource: http://genomewiki.ucsc.edu/index.php/Bin_indexing_system