As pysam is compiled C, i've never been able to run pysam under pypy.
This is a shame, since for most Bioinformatical operations (string manipulation, typical data structures, etc) pypy is considerably faster than python2.x
I think its probably worth getting pysam to run over pypy if possible - but before I start down that road, has anyone ever figured out how to get it to work already? :)
Because this problem comes up quite frequently for me, I reimplemented a lot of the pysam functionality in python code so it could be run with pypy. Feel free to fork it and clean it up. https://github.com/nijibabulu/pypysam/
God job Rob! I wish you had mentioned it a long time ago before I tried the same thing here. My code only works for BAMs but it looks like yours does FASTA and everything else of htslib too! Very nice work :)
hts-python is another Python wrapper for htslib, the C library underlying pysam and samtools. It uses CFFI instead of Cython and is compatible with Pypy. It is a less mature project, but the author (brentp) has showed some promising performance benchmarks.
I also highly recommend hts-python, for what it's worth. Actually anything made by Brent. It's probably the most stable thing right now if you want to read BAMs on pypy.
Having said that, the pure-python methods like what robert and I posted are probably better ideas going forward than trying to hook htslib, assuming all you want to do is read a BAM file. There's no C to compile, it's just as fast if not faster, and there's no dependancies on other python or C projects that users may or may not have.
If pysam spends most of time on zlib or the samtools/htslib C code, pypy won't help.
Yes, but most python programs that make use of pysam do something with the read data.
String manipulation, moving things around in memory, comparisons, etc.
So I appreciate we're not going to be able to read BAM files quicker, but our scripts overall would be a lot faster :)