Question

Hidden Markov Models In Genomic Analysis

9

Entering edit mode

13.6 years ago

didymos ▴ 210

Hi All,
I would like to use Hidden Markov Models to investigate some genomic properties (DNA breaking points). Do you know any good literature and/or tutorials about how to implement HMM in python, R (Bioconductor)? (especially for sequence analysis) I would be grateful for any comments and suggestions.

r python bioconductor • 8.7k views

ADD COMMENT • link updated 10.9 years ago by Woa ★ 2.9k • written 13.6 years ago by didymos ▴ 210

Ram · Answer 1 · 2011-09-29

13

Entering edit mode

13.6 years ago

Gjain 5.8k

Hi,

This one explains HMM with examples:

http://www.comp.leeds.ac.uk/roger/HiddenMarkovModels/html_dev/main.html

and this article in nature explains it with a biological background:

http://www.nature.com/nbt/journal/v22/n10/full/nbt1004-1315.html

Here is the R package for HMM

http://cran.r-project.org/web/packages/HMM/HMM.pdf

hope this helps.

ADD COMMENT • link updated 5.6 years ago by Ram 45k • written 13.6 years ago by Gjain 5.8k

3

Entering edit mode

For a more detailed explanation you could also look at the following book: Durbin,R. et al. (1998) Biological sequence analysis Cambridge university press, Cambridge, UK. http://books.google.com/books/about/Biological_sequence_analysis.html?id=R5P2GlJvigQC

ADD REPLY • link 13.6 years ago by Jonasr ▴ 120

2

Entering edit mode

The nbt article is a good start, and Richard's book is definitely worth reading.

ADD REPLY • link 13.6 years ago by Boboppie ▴ 550

0

Entering edit mode

oh yes ... that book is very good. thanks

ADD REPLY • link 13.6 years ago by Gjain 5.8k

score 4 · Answer 2 · 2011-09-29

4

Entering edit mode

13.6 years ago

lh3 33k

For genomic data (at least for applications I am familiar with), you need very efficient implementations. The core should be really written in low-level languages such as C. Here are comments on a few existing implementations:

BioPerl has an HMM implementation with the core written in C.
The R package HMM sounds right, but it is written purely in R (one of the slowest, if not the slowest, scripting languages) without using matrix operations, and in a very inefficient way. Probably it is going to be >1000X, if not >10000X, slower than a proper C implementation.
BioPython also has an HMM module, but written purely in Python apparently. Python is typically ~50-100X slower than C for such an application.
If you do not want to implement HMM by yourself, you may consider Ewan Birney's dynamite. It is able to generate code for very complex HMMs. The famous GeneWise is built upon that.

Richard's book is the best as a tutorial.

ADD COMMENT • link 13.6 years ago by lh3 33k

0

Entering edit mode

Richard's book? Pointers pls

ADD REPLY • link 13.6 years ago by Hranjeev ★ 1.5k

0

Entering edit mode

'Richard's book' is the one I mentioned in the comments to Gjain's answer. Richard Durbin.

ADD REPLY • link 13.6 years ago by Jonasr ▴ 120

0

Entering edit mode

Just thought the claims on speed here might be worth looking into:

http://www.ll.mit.edu/HPEC/agendas/proc03/pdfs/nehrbass.pdf

ADD REPLY • link 13.6 years ago by Delinquentme ▴ 200

score 2 · Answer 3 · 2011-09-29

I would start with the wikipedia article on HMM, followed by articles that describe HMMs from perspective of application in biology (not as such genomic analysis )by Sean Eddy.

Regarding implementation of HMM for prediction genomic properties I would reccomend to take a look at various algorithms developed to predict transcription factor binding sites (see 1, 2, 3 etc.. (Disclaimer: I am a co-author of STIF). Also, I remember that old version of HMMER package accept nucleotide sequences, looking at the source code will be helpful for you. Also see recent implementations of HMM for predicting genomic properties (1,2) other than TFBS or motifs.

Ram · Answer 4 · 2014-06-09

1

Entering edit mode

10.9 years ago

Woa ★ 2.9k

I just came across this good tutorial paper: Seven things to remember about hidden Markov models: A tutorial on Markovian models for time series

ADD COMMENT • link updated 5.3 years ago by Ram 45k • written 10.9 years ago by Woa ★ 2.9k