Hi Biostars,
I'm currently developping bioinformatic softwares for my Master's internship. I have came across a question for which no answer seems to be on this site nor anywhere else, and yet it feels like an elephant in the room :
Is biopython a " good " package ? Is it well implemented ? Has it a good reputation ? I came across several alternatives to it ( bionumpy, scikit bio to name those two ) that proudly announce that their implementation is entirely based on C++ libraries or that the memory storage of biological sequences elegantly uses the properties of numpy arrays.
But it seems to me that it SHOULD be the case for every package dealing with biological sequences, right ?
Are there some of you that have left biopython for something better ? I don't want to invest my time in something that is not quite optimized or popular among the community.
Best regards, Simon
As always I think this boils down to: define "good"?
Biopython is pretty user-friendly, has parts implemented in C for speed where it matters most.
It's not the fastest. It's not the smallest/neatest. It's probably not even that powerful compared to scikit etc.
But does that make it not "good"?
Thank you for this quick reply.
Yeah, you're right about the definition of " good ". I know for sure that BP is user friendly and well documented. My concern is that it does not seem to be used as the backbone of any big published project handling biological sequences.
I think i'm confused regarding the scope and target audience of BioPython, but your comments about its performances are helping me.
Have a nice day !
I agree with Joe, it depends on the specifics of your project.
Biopython is definitely very popular and well documented. But depending on the set of tasks your software will be handling, another package might be more suitable (maybe even something outside the bioinformatics field). In the end biopython and bionumpy are tools and you need to select the proper one for your aims, you might even use both.
BioPython is pretty squarely aimed at the beginner and intermediate analyst I think. "Proper" bioinformaticians that would be the ones likely to incorporate it in to a larger project, are also likely going to know how to code something robust from closer to the ground up, so the user-friendliness becomes less important versus performance.
Its real strength is in file format munging where it takes a lot of the heavy lifting away from end users.
I think you'd be surprised how many projects incorporate it somewhere though, even if they don't depend heavily on it at their core.