I'm performing some meta-analysis on gene-expression data from microarrays, and am looking through some of the techniques used to do this. One thing that crops up often is the use of latent variable models.
Either they are used on a per gene basis to calculate the probability of differential expression, such as Choi et al., or in a dimension reduction scheme, to highlight groups of "signature" genes as in Martoglio et al..
Both of these latent-variable based approaches are appealing to me, probably because of my Machine Learning background, as the model that the authors are using to define differential expression in both cases makes more sense to me than the more traditional statistical methods (*x*-tests, ranking).
However, I'm trying to embark on a pragmatism-not-idealism approach to work (and actually get something done), and I know that latent variable models can be a lot of effort sometimes. My questions are, therefore:
- Does anyone have any "good" experiences analysing gene-expression using latent-variable modelling approaches for differential analaysis in microarrays? For example a latent-variable model out-performed a more standard approach like SAM, or it did better at meta-analysis than RankProd.
- Does anyone have a feel for how easy latent variable models are when trying to explain to your biologist collaborators? Is the richer model worth the effort of trying to explain it?
- Is there a 'standard' R package that is used more than others for this kind of analysis? Typically, when meta-analysis shows people mention RankProd. Is there an equivalent package that the community recommends for latent-variable based approaches?
Could you maybe ask this on the bioconductor mailing list? Just in case you don't get an answer here.
I haven't tried any of these approaches, but the next time I have some gexp data to play with I will have a go. Personally, I think that a better/richer model approach is always worth a go, some biologists I know just care that the results are good and don't care about the model, although most biologists with experience looking at microarrays should know about PCA and its not that a big leap to introduce ICA and those kind of approaches. Sorry I can't help or suggest much else, but I would be really interested in hearing how you get on (if you choose this path).
It's an interesting question and I'd also like to hear about how things turn out. Not something with which I have experience, but lavaan - and OpenMx - both look interesting.
Perhaps is its obvious, but remember that no one is going to "pay attention" to any explanation of latent variables if the results are not good. Don't think of computational biology as machine learning. From my point of view, the idea is not to give complicated methods which maybe improve performance by 1%, but solving problems with relatively good results and simple models (in many cases a model used for a different tas). The simpler you model is (assuming it comes with good results), the better chances it gets to be accepted by the community.