Biopython Development on Ubuntu
1
0
Entering edit mode
6.4 years ago
ldaveyl ▴ 20

see comments for better explanation

Hi Everyone,

I’m a master student in Bioinformatics and I’m interested in contributing code to Biopython. I'd like to work with Github and on a Ubuntu Virtual Machine, however, when I fork the github repository and make a local clone (on Ubuntu or Windows) I can't edit it because I get this this error that I can't resolve. The only way I've been able to edit Biopython up til now is by working directly in the folder where other python modules are located (...\Anaconda\Lib\site-packages\Bio...) when you do "pip install biopython".

My question here is: How do I set up a good development environment for biopython on a Ubuntu Virtual Machine with git.

Biopython Ubuntu Development • 2.9k views
ADD COMMENT
5
Entering edit mode

I mean this in the best way possible, but if you struggle with VMs, editing files in Ubuntu, executing python and using git then maybe you shouldn't be planning to make changes to biopython. You first need to be fluent in all those, and many more aspects. You are not going to learn this in a couple of days. Be humble, and take your time and do everything step by step.

ADD REPLY
2
Entering edit mode

What do you mean "the only way I can edit it"? They're just text files after all, it shouldn't matter what directory you are in.

Or do you mean you're having issues developing for it as imports aren't resolving?

ADD REPLY
0
Entering edit mode

Thank you for replying, I realize I did a poor job explaining what my problem is.

  1. I made a local clone of the forked biopython repository on Windows.
  2. I shared the folder between my Windows and Ubuntu VM.
  3. I set the $PYTHONPATH variable on Ubuntu to the folder which contains biopython ("sf_Stage") with this command: export $PYTHONPATH=/media/sf_Stage/

    If I then try to run biopython code, I get an ImportError: Traceback (most recent call last): File "test2.py", line 1, in <module> from Bio import SearchIO ModuleNotFoundError: No module named 'Bio

  4. In response to that, I set the $PYTHONPATH variable to "/media/sf_Stage/biopython" but then I get this error: ImportError: cannot import name '_aligners': you should not import directly from the biopython source directory; please exit the source tree and re-launch your code from there.

In short: either I can't import Biopython because Ubuntu doesn't find it, or I get an error stating I can't import from the source directory.

ADD REPLY
0
Entering edit mode

Why not clone the repo from within your VM instead of sharing it from windows? This can cause all sorts of errors with end of line characters etc.

You do not normally need to mess with the PYTHONPATH in python these days.

Are you trying to install Biopython from the cloned repo rather than through pip etc? It’s not clear to me whether you’re trying to edit the installed version, or the forked repository...

ADD REPLY
0
Entering edit mode

My goal is to push changes to the source repository from my forked repository. I'm following this tutorial.

Just now I made a local clone of my fork in the Ubuntu VM but alas, python can't find biopython. The fork is cloned in the "Documents" folder.

ADD REPLY
4
Entering edit mode

You do not, and should not, attempt to push changes to the source repository for BioPython.

Instead, any edits you make should be to your forked repository, after which you can create a pull request, which the existing maintainers can decide to accept or reject. I believe this should also be directed at a dev branch of the Biopython repository, not the Master.

In your VM, I would advise installing conda and using virtualenvs/condaenvs for your development. That will come with its own python binary which you can prioritise by editing your PATH variable (not pythonpath). This will keep everything segregated and package managed.

At the moment I suspect you’re relying on a system python binary and getting your versions mixed up.

ADD REPLY
0
Entering edit mode

To add to jrj.healey's point, these repositories will have permissions set so that only repository owners can make changes, so even if you tried to push changes, you'd be denied permissions. Follow jrj.healey's outline of fork -> pull forked repo -> make changes -> test, commit and push -> create pull request.

ADD REPLY
2
Entering edit mode

Treat your VM as what it is - an isolated operating system. Once you start doing that and stop relying on any software resource the host OS might have, you will have an easier time.

Set up your development environment on the VM using apt-get. Install git, conda, etc and then git clone the repository.

ADD REPLY
0
Entering edit mode

I tried what you all suggested, but I kept hitting the same roadblock: I can't make changes in my Biopython clone and also run the code.

So the only solution I see here is to develop in a different biopython and then when I know the code works I will make the changes in the clone and then push them to github.

ADD REPLY
1
Entering edit mode

You're telling me you cloned a repository by running a git clone from within your VM and you cannot edit the files that were downloaded by the git clone operation? That makes absolutely no sense, and it's either that or you did not "tried what you all suggested".

ADD REPLY
0
Entering edit mode

I can edit the files obtained from git clone, but then I can't run that code anymore because I get the error that I linked in the first post. I know it makes no sense to me either but I got the same problem in the virtual environment.

ADD REPLY
0
Entering edit mode

Well, it sounds like your edits are probably breaking the code then. Biopython is very large, and tinkering with one aspect of it may break it elsewhere in ways that you don't immediately realize. Have you ran Biopython's tests before/after editing the code?

ADD REPLY
2
Entering edit mode
6.4 years ago

The comments have answered your git questions quite well, but I'd also like to point out that you don't need a VM to run Ubuntu on Windows anymore with the release of the Windows Subsystem for Linux. It works very well and I've had no issues running Biopython (or any other bioinformatics software). It allows you to leverage your full system resources rather than allocating only a few cores of your processor to a VM.

The suggestion of using virtualenvs through conda is also a very good idea.

ADD COMMENT
0
Entering edit mode

that is so useful to know, thanks!

ADD REPLY

Login before adding your answer.

Traffic: 2222 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6