Recently, I heard a talk about using machine learning in ChIP-Seq data analysis and binding site predictions, thus got interested. I wasn't able to find some solid reviews or examples on, what one can do and achieve with it. How powerful is it in this context and are the results reliable enough. Could you shed some light on
the aspect of using machine learning with regard to ChIP-Seq (and/or RNA-Seq),
what to expect and
the advantages/disadvantages, last but not the least,
Here are some pointers on transcription factor binding prediction and Chip-seq methods. Most of them are not strictly machine learning papers, but one or the other method of ML has been employed to discover "new biology".
Predicting binding from sequence
Weirauch, M.T. et al., 2013. Evaluation of methods for modeling transcription factor sequence specificity. Nature Biotechnology.
Zambelli, F., Pesole, G. & Pavesi, G., 2009. Pscan: finding over-represented transcription factor binding site motifs in sequences from co-regulated or co-expressed genes. Nucleic Acids Research, 37(Web Server), pp.W247–W252.
ChIP-seq based
Gerstein, M.B. et al., 2012. Architecture of the human regulatory network derived from ENCODE data. Nature, 489(7414), pp.91–100.
Wang, J. et al., 2012. Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors. Genome Research, 22(9), pp.1798–1812.
Whitfield, T.W. et al., 2012. Functional analysis of transcription factor binding sites in human promoters. Genome Biology, 13(9), p.R50.
Abstracting tissue specificity by DNAse hypersensitivity
Arvey, A. et al., 2012. Sequence and chromatin determinants of cell-type-specific transcription factor binding. Genome Research, 22(9), pp.1723–1734.
Natarajan, A. et al., 2012. Predicting cell-type-specific gene expression from regions of open chromatin. Genome Research, 22(9), pp.1711–1722.
Pique-Regi, R. et al., 2011. Accurate inference of transcription factor binding from DNA sequence and chromatin accessibility data. Genome Research, 21(3), pp.447–455.
Thanks for the pointers Michael, I should read more!!