Question

The 'Bin' Column Used By Sam, Ucsc...

4

Entering edit mode

15.0 years ago

Pierre Lindenbaum 166k

Hi all,

Some mysql tables at the UCSC use a special column named 'bin'. For example in http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/snp130.sql:

CREATE TABLE `snp130` (
  `bin` smallint(5) unsigned NOT NULL default '0',
  (...)

It is not a primary key and it seems that this bin-thing is also used by the samtools (e.g. http://samtools.sourceforge.net/tabix.shtml )

What is that column? How is it used?

Pierre

database index • 5.6k views

ADD COMMENT • link updated 6.6 years ago by Ram 45k • written 15.0 years ago by Pierre Lindenbaum 166k

3

Entering edit mode

15.0 years ago

Fred Fleche 4.3k

Hello Pierre,

[?]

https://lists.soe.ucsc.edu/pipermail/genome/2010-April/021993.html

Hope this helps

ADD COMMENT • link updated 6.6 years ago by Ram 45k • written 15.0 years ago by Fred Fleche 4.3k

0

Entering edit mode

thanks but it doesn't say how it works :-)

ADD REPLY • link 15.0 years ago by Pierre Lindenbaum 166k

0

Entering edit mode

12.9 years ago

Ettore Rizzo • 0

Hi Pierre, I'm facing the 'bin' field in UCSC table. I've read your blog http://plindenbaum.blogspot.it/2010/05/binning-genome.html about that. I usually work with Perl and I'm not familiar at all with Java...so is quite impossible to translate your java code. The only articles that explain how to manage this field give few details. Do you know a Perl script that does the same thing? Otherwise can you suggest me a more detailed article? Thanks in advance

ADD COMMENT • link 12.9 years ago by Ettore Rizzo • 0

1

Entering edit mode

See Heng Li's code: https://github.com/lh3/misc/blob/master/biodb/batchUCSC.pl

ADD REPLY • link 12.9 years ago by Pierre Lindenbaum 166k

Ram · Accepted Answer · 2010-05-05

Sorry, I found an answer to my question in http://samtools.sourcearchive.com/documentation/0.1.6~dfsg/bam__index_8c-source.html

The UCSC binning scheme was suggested by Richard Durbin and Lincoln Stein and is explained by Kent et al. (2002). In this scheme, each bin represents a contiguous genomic region which can be fully contained in another bin; each alignment is associated with a bin which represents the smallest region containing the entire alignment. The binning scheme is essentially another representation of R-tree. A distinct bin uniquely corresponds to a distinct internal node in a R-tree. Bin A is a child of Bin B if region A is contained in B.

In BAM, each bin may span 2^29, 2^26, 2^23, 2^20, 2^17 or 2^14 bp. Bin 0 spans a 512Mbp region, bins 1-8 span 64Mbp, 9-72 8Mbp, 73-584 1Mbp, 585-4680 128Kbp and bins 4681-37449 span 16Kbp regions. If we want to find the alignments overlapped with a region [rbeg,rend), we need to calculate the list of bins that may be overlapped the region and test the alignments in the bins to confirm the overlaps. If the specified region is short, typically only a few alignments in six bins need to be retrieved. The overlapping alignments can be quickly fetched.