Question

How To Read A Bigwig File Using Python

0

Entering edit mode

12.7 years ago

Fidel ★ 2.0k

Hi everyone,

I am using bx-python to read a bigwig file. However, the current implementation is very slow. A single query in my benchmarks takes about 0.02 seconds in a 2.5Ghz server and I need to run thousands of queries. Through parallelization and other tricks I can read the bigwig faster but I wonder whether anyone knows other library or means to query a bigwig file using python? The perl library to read bigwig files is very fast but I will prefer a python solution.

Update to clarify the problem:

"""
usage: %prog bigwig_file.bw  < bed_file.bed 
"""
from bx.intervals.io import GenomicIntervalReader
from bx.bbi.bigwig_file import BigWigFile
import numpy as np
import time
import sys

bw = BigWigFile( open( sys.argv[1] ) )
ll = []
for interval in GenomicIntervalReader( sys.stdin ):
    start = time.time()
    bw.query(interval.chrom, interval.start, interval.end, 20 ) 
    total = time.time() - start
    ll.append(total)

print np.mean(ll)

This python script will print the average time of each call. For a bed file containing thousands of lines this code may take half an hour while a Perl counterpart using the Bio-BigFile library takes only few seconds.

python bigwig • 13k views

ADD COMMENT • link updated 11.2 years ago by tszn1984 ▴ 100 • written 12.7 years ago by Fidel ★ 2.0k

1

Entering edit mode

Which perl library are you using?

ADD REPLY • link 12.7 years ago by Tommy Carstensen ▴ 40

1

Entering edit mode

Hi Marcin, I profiled the python code and for each call to the BigWigFile query() method there are 10.000 calls to the read_and_unpack() method. This seems to be causing the slow down. I reported the issue to the developers because I could not solve the problem by analysing the code (see https://bitbucket.org/james_taylor/bx-python/issue/38/read-a-bigwig-file-is-slow).

ADD REPLY • link 12.7 years ago by Fidel ★ 2.0k

1

Entering edit mode

Looks like the look in query() is not cythonized, that might help: https://bitbucket.org/james_taylor/bx-python/src/38dc8eb987fb/lib/bx/bbi/bbi_file.pyx#cl-215

ADD REPLY • link updated 5.1 years ago by Ram 44k • written 12.7 years ago by brentp 24k

0

Entering edit mode

Just to be sure you are saying that random-access of a BigWig in bx-python is considerably slower than Perl? Maybe profile the python code and see were the time goes.

ADD REPLY • link 12.7 years ago by Marcin Cieslik ▴ 520

score 2 · Answer 1 · 2013-10-04

I write a BigWigFile class by wrapping the Kent's lib. It very convenient to use and very very fast. The manuals is here: http://tsznxx.appspot.com/BigWigFile

The source code can be downloaded from github: git clone git://github.com/tsznxx/wLib.git wLib. You just need to (1) go into the external/Kentlib and make, (2) go into wWigIO, change the python.h path to your python.h location, and run make. The wWigIO.so and BigWigFile.py are what your want.

Please contact me if you have any questions.

score 0 · Answer 2 · 2012-03-28

0

Entering edit mode

12.7 years ago

brentp 24k

If you're doing random access, try the Cython version. Examples here

I think you can do:

bw.query("chr1", 10000, 20000, 1)

To get all the features on chromosome 1 between 10000 and 20000.

ADD COMMENT • link 12.7 years ago by brentp 24k

0

Entering edit mode

This is exactly what I am doing but each call to the query method is rather slow.

ADD REPLY • link 12.7 years ago by Fidel ★ 2.0k

score 0 · Answer 3 · 2012-04-12

Reading the source, it appears that query() is taking the results from summarize(), which returns an object containing NumPy arrays. query() then iterates over those results and sort of re-packages them into a dictionary.

Is it fast enough for your purposes if you use the summarize() method directly? I just did some quick benchmarks and, while it greatly depends on the underlying data and the query, summarize() can be up to 3x faster than query().