Hidden Markov Models In Genomic Analysis
4
9
Entering edit mode
13.2 years ago
didymos ▴ 210

Hi All,
I would like to use Hidden Markov Models to investigate some genomic properties (DNA breaking points). Do you know any good literature and/or tutorials about how to implement HMM in python, R (Bioconductor)? (especially for sequence analysis) I would be grateful for any comments and suggestions.

r python bioconductor • 8.2k views
ADD COMMENT
13
Entering edit mode
13.2 years ago
Gjain 5.8k

Hi,

This one explains HMM with examples:

and this article in nature explains it with a biological background:

Here is the R package for HMM

hope this helps.

ADD COMMENT
3
Entering edit mode

For a more detailed explanation you could also look at the following book: Durbin,R. et al. (1998) Biological sequence analysis Cambridge university press, Cambridge, UK. http://books.google.com/books/about/Biological_sequence_analysis.html?id=R5P2GlJvigQC

ADD REPLY
2
Entering edit mode

The nbt article is a good start, and Richard's book is definitely worth reading.

ADD REPLY
0
Entering edit mode

oh yes ... that book is very good. thanks

ADD REPLY
4
Entering edit mode
13.2 years ago
lh3 33k

For genomic data (at least for applications I am familiar with), you need very efficient implementations. The core should be really written in low-level languages such as C. Here are comments on a few existing implementations:

  • BioPerl has an HMM implementation with the core written in C.

  • The R package HMM sounds right, but it is written purely in R (one of the slowest, if not the slowest, scripting languages) without using matrix operations, and in a very inefficient way. Probably it is going to be >1000X, if not >10000X, slower than a proper C implementation.

  • BioPython also has an HMM module, but written purely in Python apparently. Python is typically ~50-100X slower than C for such an application.

  • If you do not want to implement HMM by yourself, you may consider Ewan Birney's dynamite. It is able to generate code for very complex HMMs. The famous GeneWise is built upon that.

Richard's book is the best as a tutorial.

ADD COMMENT
0
Entering edit mode

Richard's book? Pointers pls

ADD REPLY
0
Entering edit mode

'Richard's book' is the one I mentioned in the comments to Gjain's answer. Richard Durbin.

ADD REPLY
0
Entering edit mode

Just thought the claims on speed here might be worth looking into:

http://www.ll.mit.edu/HPEC/agendas/proc03/pdfs/nehrbass.pdf

ADD REPLY
2
Entering edit mode
13.2 years ago

I would start with the wikipedia article on HMM, followed by articles that describe HMMs from perspective of application in biology (not as such genomic analysis )by Sean Eddy.

Regarding implementation of HMM for prediction genomic properties I would reccomend to take a look at various algorithms developed to predict transcription factor binding sites (see 1, 2, 3 etc.. (Disclaimer: I am a co-author of STIF). Also, I remember that old version of HMMER package accept nucleotide sequences, looking at the source code will be helpful for you. Also see recent implementations of HMM for predicting genomic properties (1,2) other than TFBS or motifs.

ADD COMMENT
1
Entering edit mode
10.5 years ago
Woa ★ 2.9k

I just came across this good tutorial paper: Seven things to remember about hidden Markov models: A tutorial on Markovian models for time series

ADD COMMENT

Login before adding your answer.

Traffic: 2776 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6