Linux Distros Best Suited For Bioinformatics?
13
15
Entering edit mode
12.9 years ago
Sat3Lite ▴ 150

Hello all,

I wanted to get a poll of what distros are currently popular with researchers working in bioinformatics and related biological data mining fields.

Mainly:

  • What linux distribution do you currently (or have previously) use(d)?
  • What software and language (packs) do you use on a daily basis? -
  • Where does this distro outperform the rest? Where does it fall short?

I'll start; I currently use Ubuntu and have for the past two years. I mainly use vim with BioPerl scripts. Ubuntu, which is debian based makes installing foreign libraries almost painless. It falls short when it comes to software updates.


linux • 35k views
ADD COMMENT
3
Entering edit mode

Please do a search before reposting, besides i think the question is not what distribution would be best for bioinformatics, since it depends what are you working on, none of the distribution may have the software, packages you need. It's like asking what pencil is better for taking class notes

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

Agree with @raygozak. A lot of this is going to be about personal preference and the needs of the task at hand. The Linux world is huge, and so is the bioinformatics world.

ADD REPLY
0
Entering edit mode

I would recommend to stay with the recent and frequent updates and upgrades of Ubuntu.

ADD REPLY
14
Entering edit mode
12.9 years ago

BioLinux works very well. It is equipped with a lot of bioinformatics-related software, and is based on a Ubuntu system. You can also use an Ubuntu system directly, and use BioLinux as a repository for bioinformatics software.

alt text

ADD COMMENT
1
Entering edit mode

I would also like to point out http://cloudbiolinux.org/ for those who need high computational power but don't have the power of Greyskull on their side.

ADD REPLY
0
Entering edit mode

This seems a workstation distro. Any experience with running this on a server? We're currently using Debian for that server, but we might consider switching to something else.

ADD REPLY
10
Entering edit mode
12.9 years ago
Fabian Bull ★ 1.3k

Currently in use: Arch Linux

Software packages:

  • R (statistical analysis)
  • Scala (programming tasks)
  • Perl / BioPerl (simple pipelines)
  • blast / bwa (local alignments)
  • LaTex / Inkscape (presentation and visualization)
  • Processing (Visualization with java)

Pros and cons: (decide yourself what is pro and what is con)

  • Very clean and lightweight
  • Hard to configure
  • Very configurable
  • Best documented linux
  • Your learn how linux works
  • Great community
  • Always current versions of packages
  • Sliding releases

Conclusion:

Perfect distro if you know what you are doing.

ADD COMMENT
0
Entering edit mode

Big plus: the AUR, where users can upload build scripts that everone can use. There is quite a wide range of bioinformatics applications available http://aur.archlinux.org/

ADD REPLY
0
Entering edit mode

Will compiling from scratch be an issue here. Or are there binaries?

ADD REPLY
0
Entering edit mode

The mentioned AUR requires compiling but this is done automatically by some scripts. The binary repositories are not as big as the ones from ubuntu but they are better mantained.

ADD REPLY
0
Entering edit mode

Compilation and installation are fully automatic for all AUR source packages. If they are not, this is a package bug and should be reported to the maintainer.

ADD REPLY
8
Entering edit mode
12.9 years ago
Pascal ★ 1.5k

Ubuntu. I agree with you that it is very comfortable when installing new software (most of them are Debian package friendly). IMHO I think this is the best choice for the time being as it helps not moving away from the main stream.

FYI, Linux Mint is getting very popular, but this is not related to Bio.

BTW, do you know one has written new source code editor since VIM ? ;-)

ADD COMMENT
4
Entering edit mode

you mean emacs?

ADD REPLY
0
Entering edit mode

I'm a big fan of Ubuntu!

ADD REPLY
6
Entering edit mode
12.9 years ago
ALchEmiXt ★ 1.9k

Ubuntu and just add the support for BioLinux packages...works quite well for us. This allows us to keep our own custom server and just pull the tools preconfigured from biolinux if we need them.

ADD COMMENT
1
Entering edit mode

same experience for me

ADD REPLY
0
Entering edit mode

Also as bootable Live CD and configurable to run inside a VM :-)

ADD REPLY
5
Entering edit mode
12.9 years ago

This article: Linux distributions for bioinformatics: an update describes several options, including DNALinux, a virtual machine appliance for those that want to run Linux within their existing OS.

ADD COMMENT
1
Entering edit mode

I don't think that article is up to date anymore, it was written in 2009.

ADD REPLY
5
Entering edit mode
12.9 years ago

As I just commented, Ubuntu (I prefer LTS in my case) and BioLinux package is the solution I use, and I'm sure many others are good as well.

I think it could be also interesting to know what distributions are NOT good for bioinformaticians. Which distro should we avoid? I think about (it's just an example) CentOS, that a computer network guy (not bio at all) advised to me for its stability. The problem for me was that the software ecosystem in bioinformatics is very active, and unfortunatly it was so annoying to install or compile some simple stuffs that needed Python, Ruby, last gcc... etc. So I gave up CentOS and went back to Ubuntu, which, at least for the LTS, has not more stability problem in my case...

ADD COMMENT
4
Entering edit mode
12.9 years ago
Guy ▴ 50

I believe any popular distro should be good enough. And i say popular because most packages and software for linux will be tested thoroughly only for those which are most used.

If it helps- all the PhD students in the lab where i work use Ubuntu.

EDIT: For any serious work i would suggest using an LTS version. The last one being Lucid

ADD COMMENT
3
Entering edit mode
12.8 years ago

Deep in my heart I wish to say that the choice of your distribution is a complete non-issue. The tools you need should be available as regular packages of your distribution today. If not all, then most of them. And the community should embrace you when you come with your desire for an additional tool to be packaged or when you already come with those packages that you wish to bring back to the (re)distribution for the benefit of all. I tend to think that most if not all community-run distributions now work like that and Bioinformatics packages seem omnipresent for basic analyses in all distros. So just make your pick.

My personal choice was Debian since the community started the distribution all and still controls it. And for anyone submitting a package to Debian, one is happy for every user in one of the many "derivative distros", like Ubuntu or Mint ... or BioLinux. And after all, it is the community that I am after, not only the tools. Some good parts of us want to talk Biology, or Computational Biology for that matter, and what Debian derivative one uses is not of any concern - we can easily run everything everywhere and all bioinformatics package names are identical between the .deb distributions. The latter point is the most important to me: how quickly can others join practically with ideas, or how portable are parts of their workflows on my machine. The .deb world is big and open and collaborative.

So, with Debian we have quite an accepted server platform, BioLinux uses what Debian has and adds what it wants to add to it. Both share a repository with the DebianMed initiative (a somewhat better name IMHO may be DebianUbuntuBioLinuxMint initiative) where packaging efforts are communicated and shared between them. There are annual cross-distributional .deb sprints on Bioinformatics and then one knows each other from conferences over the years. There is no separation of DebianMed from the rest of Debian. When there is e.g. a Java package missing for a Bioinformatics package, it is added to the benefit of all Debian/Ubuntu/Mint/BioLinux users. And that transition is automated for the respective latest releases.

Particularly attractive is now an extension of BioLinux to CloudBioLinux, an Amazon Image auto-mounting relevant public datasets and with a prepared interface for Galaxy, a web-based workflow engine. Neat. There is also a Debian variant of CloudBioLinux. Also neat. Why? I am not completely sure. Most likely for two reasons: a) it is technically simple and b) there is a particular scientific and technical beauty in knowing everything on an image to be buildable, i.e. inspectable, from source code.

The enemy is not the distribution. It is time. Software changes rapidly. And while LTS (long term support) of Ubuntu is great, it becomes increasingly difficult to maintain current bioinformatics software with older Python and other everything. And when someone packages for Debian (where packaging is always performed against the very latest version) then to share that effort with other distros or the same distro's earlier releases is not always straight forward. One needs to backport. From my observation, this backporting is performed nicest by BioLinux.

When equipped with root privileges, then there is the concept of "chroot" environments that lets you install any distribution into any other. For instance, I ran Debian because of all the Bioinformatics it ships within SuSE for many years - including a second X interface that I reached with SHIFT-ALT-F8. So, have whatever you need in some production environment chrooted or just do not touch it once it is up. You can have multiple such hroots in parallel. Also learn about the dpkg tools (the package manager of .deb distros) "hold" option that avoids accidential updates of packages with the advent of a new version.

My personal suggestion is to take any Debian derivative. It does not matter which one it is. You easily change between them, e.g. my desktop I incrementally "downgraded" from Ubuntu to Debian by adding the Debian package repositories and removing the Ubuntu ones. Not that I would ultimately suggest that to everyone, rest assured that compatibility is higher than one might think. For an overview, have a look at this [?]list of "bio" packages[?] in Debian/Ubuntu.

Good luck,

Steffen

ADD COMMENT
0
Entering edit mode

My personal suggestion would be to try Debian suite which suits you best. Want stability (e.g. in cluster computing env) -- try stable; want fresh versions -- try testing or even unstable to get all new goodness DebianMed brings with it.

chroot solutions/workarounds are indeed under-utilized. I quite often rely on debootstrap + schroot tandem for various reasons, e.g. to run software not supported by the specific Debian suite like http://neuro.debian.net/blog/2011/2011-12-12_schroot_fslview.html

ADD REPLY
2
Entering edit mode
12.9 years ago
sklages ▴ 170

I do compile most bioinformatic packages on my own. I don't use any repos for bioinformatic software packages. I had a lot of problems with Ubuntu-based distributions; they seem to do a lot "their own way". Compiling packages like Staden is quite a hassle under Ubuntu.

In my hands for compiling from source any RedHat-based distribution is working fine: CentOS is RHEL and thus a bit conservative, but works fine. Privatly and for testing I use bleeding-edge Fedora (at work an in-house developed linux distribution).

My 2p :-) Sven

ADD COMMENT
2
Entering edit mode
12.9 years ago
User 9501 ▴ 30

On any given day I might/others might git clone and run/write code on CentOS, Mint Debian, OSX and RHEL. The only sane solution is to export PATH=/home/myname/mydir:$PATH on all my boxes and my cluster node, then install the same dependencies across workstations. It works for me, I can control exactly which version of the library I want to use and updating them isn't that hard.

Any linux distribution will do, I like Gentoo or Mint Debian.

Gentoo has an active bioinformatics community. They also have very up-to-date software. Everything is compiled to your liking and hardware. When properly configured I was surprised how much faster my software was with aggressive flags and modern instructions on 4-core cpus. It is a significant speedup compared to the "run anywhere" packages in most distros repos. Numerical libs like ATLAS, BLAS see speedups of 10x or much greater.

Mint Debian/Debian/Ubuntu has a large community, so by extension they have a large bioinformatics community. There are a boatload of external repos that are great. Neurodebian is pretty good. They are all easy to install, not a timesink to update, very stable. In other words they don't get in my way.

ADD COMMENT
2
Entering edit mode
12.8 years ago

You might like to try a look at the packages bundled into official Debian by the Debian Med team. There is some overview available at http://debian-med.alioth.debian.org/tasks/bio These packages are maintained by a constantly growing team of bioinformaticans who are members of the Debian Med team. All packages are available via the official Debian mirrors and as a consequence of this are available in Ubuntu as well. The team is working closely together with the BioLinux team which is based on Ubuntu LTS. This means that packages which are currently only available in BioLinux will be moved directly into Debian and finally also end up in official Ubuntu.

ADD COMMENT
1
Entering edit mode
12.9 years ago

Personally, I don't like the new interface of Ubuntu (Unity). I prefer something simpler and more similar to windows like Gnome interface. Gnome is more similar to Windows and it's easy to a new student to work on something familiar. Linux Mint is also nearly 100% compatible with Ubuntu software.

ADD COMMENT
0
Entering edit mode
9.0 years ago
5heikki 11k

I'm getting a new workstation. Specs include 2 x Xeon E5-2630 v3 (altogether 32 threads), 128GB RAM, 512GB SSD and 2 x 2TB HDD for storage. I'm thinking I'll either set up the latest Ubuntu LTS or CentOS on it. EOL for the latest Ubuntu LTS is April 2019. EOL for CentOS 7 is June 2024. Both of these are fine. Any strong opinions towards one of these distros or some alternative? I'm not interested in distros with short life cycles. Debian Stable is an option although I'm a little bit worried about the relatively old versions of many of its packages (may bite as dependency problems). I prefer Gnome over KDE but in the end desktop environment doesn't matter that much since my screens will mostly be filled by terminal windows anyway. I'll build most stuff from source and don't see much difference between apt and yum..

ADD COMMENT
0
Entering edit mode

Never had a try myself, but check out Bio-Linux (based on Ubuntu LTS)

They added some Bioinformatics software packages but I am not sure how up-to-date the tools are. Have a look at the software list on their webpage.

ADD REPLY

Login before adding your answer.

Traffic: 1930 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6