Hi everyone,
I am using bx-python to read a bigwig file. However, the current implementation is very slow. A single query in my benchmarks takes about 0.02 seconds in a 2.5Ghz server and I need to run thousands of queries. Through parallelization and other tricks I can read the bigwig faster but I wonder whether anyone knows other library or means to query a bigwig file using python? The perl library to read bigwig files is very fast but I will prefer a python solution.
Update to clarify the problem:
"""
usage: %prog bigwig_file.bw < bed_file.bed
"""
from bx.intervals.io import GenomicIntervalReader
from bx.bbi.bigwig_file import BigWigFile
import numpy as np
import time
import sys
bw = BigWigFile( open( sys.argv[1] ) )
ll = []
for interval in GenomicIntervalReader( sys.stdin ):
start = time.time()
bw.query(interval.chrom, interval.start, interval.end, 20 )
total = time.time() - start
ll.append(total)
print np.mean(ll)
This python script will print the average time of each call. For a bed file containing thousands of lines this code may take half an hour while a Perl counterpart using the Bio-BigFile library takes only few seconds.
Which perl library are you using?
Hi Marcin, I profiled the python code and for each call to the BigWigFile query() method there are 10.000 calls to the read_and_unpack() method. This seems to be causing the slow down. I reported the issue to the developers because I could not solve the problem by analysing the code (see https://bitbucket.org/james_taylor/bx-python/issue/38/read-a-bigwig-file-is-slow).
Looks like the look in
query()
is not cythonized, that might help: https://bitbucket.org/james_taylor/bx-python/src/38dc8eb987fb/lib/bx/bbi/bbi_file.pyx#cl-215Just to be sure you are saying that random-access of a BigWig in bx-python is considerably slower than Perl? Maybe profile the python code and see were the time goes.