Entering edit mode
6.6 years ago
endrebak
▴
980
https://github.com/endrebak/ncls
Just released. Bug reports welcome.
I should make better timings. I see that I did not explore the full space of possibilities in first announcement. Seems to be many times faster than intervaltrees for most of my uses though.
from ncls import NCLS
import pandas as pd
starts = pd.Series(range(0, 5))
ends = starts + 100
ids = starts
ncls = NCLS(starts.values, ends.values, ids.values)
it = ncls.find_overlap(0, 2)
for i in it:
print(i)
# (0, 100, 0)
# (1, 101, 1)
I think it's supposed to be faster for short intervals?
Yes, brentp kindly pointed out some possible errors with my timings. Still, it seems many times faster for most use-cases I have tried. GenomicRanges has also changed from intervaltree to NCLS.
Could you elaborate on use cases for this?
To me, bioinformatics is basically range overlap. So it speeds up all my work by a lot. My reason for wrapping it is that I am writing a GenomicRanges for Python and want incredibly fast object creation and overlap/intersection queries.