Question

Which Bioinformatics Tools Are Written In Python

16

Entering edit mode

13.0 years ago

Chen Sun ★ 1.1k

Which bioinformatics tools are written in python?

I ask this question because new bioinformatic programmers or new pythoners like me can read the source code to find out how python can be used to deal with complex bioinformatics problems besides the problems solved in related books such as "Beginning Python for Bioinformatics"

Thank you

python • 29k views

ADD COMMENT • link updated 2.1 years ago by Ram 45k • written 13.0 years ago by Chen Sun ★ 1.1k

score 25 · Answer 1 · 2012-08-08

There are so many! To get you started:

Biopython: set of freely available tools for biological computation
PyMOL: molecular visualization system
PyCogent is a software library for genomic biology
Galaxy: an open, web-based platform for data intensive biomedical research
pygr: sequence and comparative genomics analyses, even with extremely large multi-genome data sets
Biskit: facilitates the manipulation and analysis of macromolecular structures, protein complexes, and molecular dynamics trajectories
Ruffus: a lightweight python module for running computational pipelines
Pysam: for reading and manipulating Samfiles
msatcommander: locates microsatellite (SSR, VNTR, &c) repeats within fasta-formatted sequence or consensus files
glu-genetics: tools to store, clean, and analyze data generated by whole-genome or candidate gene association scans
PySCeS provides a variety of tools for the analysis of cellular systems
OpenAlea: odules to analyse, visualize and model the functioning and growth of plant architecture
ETE assists in the automated manipulation, analysis and visualization of phylogenetic and other type of trees
bx-python: allows for rapid implementation of genome scale analyses
RSeQC: comprehensively evaluate high throughput sequence data especially RNA-seq data
incf-omni: analysis and simulation construction of the nervous system
genetrack: storing, querying and visualizing genomic interval oriented data
chimerascan: detection of chimeric transcripts in high-throughput sequencing data

Since you're new to the field of bioinformatics, you might also be interested in:

ANGUS, a site built around the 2010 course on Analyzing Next-Generation Sequencing Data. It contains a number of detailed tutorials on mapping, assembly, mRNAseq, ChIP-seq, and resequencing analysis using Python.
this article by Peter Norvig on species barcoding

To give another example of the very valid point that Dk made: the company I work for (Applied Maths) sells a bioinformatics software suite called BioNumerics. The core of the program is written in C++, but Python is used to customize the software to specific clients' needs:

to create custom reports,
to import and export non-standard formats,
to automate series of actions that are executed repeatedly,
to perform custom calculations, etc.

score 3 · Answer 2 · 2012-08-08

3

Entering edit mode

13.0 years ago

Damian Kao 16k

Most commonly used tools are written in compiled languages like C or java simply because they run faster and the ability to access low level memory resources are crucial to analyzing large amounts of data. When python is used in these packages, it is usually in the form of 'pipeline glue'.

Tophat (http://tophat.cbcb.umd.edu/) is a perfect example of that. It consist of several smaller programs written in C. Python is then used to interpret user paramters and run the smaller programs in sequence.

Interpreted languages like python or perl are usually used for format conversions or statistics reporting.

Good place to start for real examples is to read up on BioPython (http://biopython.org/wiki/Biopython). Their tutorials have tons of real life examples. You can come up with small projects for yourself like writing a script that analyzes gc content of a fasta file, or a script that parses a blast output file and filter on various criteria.

ADD COMMENT • link 13.0 years ago by Damian Kao 16k

1

Entering edit mode

I will chime in to say QIIME (http://www.qiime.org) is another example.

ADD REPLY • link 13.0 years ago by Cliff Beall ▴ 480

1

Entering edit mode

I believe you're right about the speed consideration, the ability of C or C++ to access low level RAM...etc lets one possibilities to tune a program as close to the hardware as possible (one can also try assembler), but I'm sure the way of coding to achieve a specific task is more critical. Look at, for instance, this biostar discussion. http://www.biostars.org/post/show/10353/how-to-efficiently-parse-a-huge-fastq-file/ (Leszek answer). For the thread interest I would say: Python is good, but use dict() and set() types instead of lists whenever you can.

ADD REPLY • link 13.0 years ago by Manu Prestat 4.1k

1

Entering edit mode

My answer is malformated due to the transition of the website. If you read my answer together with reformated table in a separate answer, you will know a proper C/C++ implementation is 4-fold faster than Leszek's script. The C++ one is slow due to a stdio synchronization issue which I only know recently. Also, each data structure has its own use. It is just in that example dict() is better.

ADD REPLY • link 13.0 years ago by lh3 33k

0

Entering edit mode

OK, I see. When I read that answer properly the last time, the best implementation race was not over yet :-) However, that still supports the fact the way of coding is very critical, whatever the programming language. That is a very good post, I like biostar especially for that kind of these. BTW, I'll compile right back Pierre's code.

ADD REPLY • link 13.0 years ago by Manu Prestat 4.1k

0

Entering edit mode

on formatting: a new fix is incoming will be applied over the weekend most likely

ADD REPLY • link 13.0 years ago by Istvan Albert 103k

score 1 · Answer 3 · 2012-08-08

1

Entering edit mode

13.0 years ago

Adam ★ 1.0k

The short-read mapper, Stampy, is written in Python. http://www.well.ox.ac.uk/project-stampy

ADD COMMENT • link 13.0 years ago by Adam ★ 1.0k

score 0 · Answer 4 · 2012-08-08

0

Entering edit mode

13.0 years ago

Woa ★ 2.9k

I would suggest search google or google scholar with your topic of interest plus something like "python script" or "python code" eg.

protein structure superposition + "python script"

ADD COMMENT • link 13.0 years ago by Woa ★ 2.9k

score 0 · Answer 5 · 2012-08-08

0

Entering edit mode

13.0 years ago

Manu Prestat 4.1k

The Biopieces suite is made of python and ruby.

ADD COMMENT • link 13.0 years ago by Manu Prestat 4.1k

1

Entering edit mode

Very little is in Python, yet. Most is Perl and Ruby.

ADD REPLY • link 12.7 years ago by Martin A Hansen 3.0k

score 0 · Answer 6 · 2014-08-11

0

Entering edit mode

11.0 years ago

sgruenwald ▴ 10

A good way to find interesting modules is to search the pip python libraries:

https://pypi.python.org/pypi?%3Aaction=search&term=bio&submit=search

ADD COMMENT • link 11.0 years ago by sgruenwald ▴ 10

Ram · Answer 7 · 2014-08-11

0

Entering edit mode

11.0 years ago

Chris Evelo 10k

Our Go-Elite Gene Ontology and general gene set overrepresentation analysis tool is written in Python. It was described here.

ADD COMMENT • link updated 3.6 years ago by Ram 45k • written 11.0 years ago by Chris Evelo 10k

score 0 · Answer 8 · 2019-12-27

0

Entering edit mode

5.6 years ago

Oommen K Mathew PhD • 0

Best way of finding python modules for bioinformatics is to use the classifier option (https://pypi.org/classifiers/) in pypi.

https://pypi.org/search/?q=&o=&c=Topic+%3A%3A+Scientific%2FEngineering+%3A%3A+Bio-Informatics

ADD COMMENT • link 5.6 years ago by Oommen K Mathew PhD • 0