Changes of bivalent chromatin coincide with increased expression of developmental genes in cancer
Abstract
Bivalent (poised or paused) chromatin comprises activating and repressing histone modifications at the same location. This combination of epigenetic marks at promoter or enhancer regions keeps genes expressed at low levels but poised for rapid activation. Typically, DNA at bivalent promoters is only lowly methylated in normal cells, but frequently shows elevated methylation levels in cancer samples. Here, we developed a universal classifier built from chromatin data that can identify cancer samples solely from hypermethylation of bivalent chromatin. Tested on over 7,000 DNA methylation data sets from several cancer types, it reaches an AUC of 0.92. Although higher levels of DNA methylation are often associated with transcriptional silencing, counter-intuitive positive statistical dependencies between DNA methylation and expression levels have been recently reported for two cancer types. Here, we re-analyze combined expression and DNA methylation data sets, comprising over 5,000 samples, and demonstrate that the conjunction of hypermethylation of bivalent chromatin and up-regulation of the corresponding genes is a general phenomenon in cancer. This up-regulation affects many developmental genes and transcription factors, including dozens of homeobox genes and other genes implicated in cancer. Thus, we reason that the disturbance of bivalent chromatin may be intimately linked to tumorigenesis.
read complete publication: http://www.nature.com/articles/srep37393
This is an article in Scientific Reports, which is NPG's equivalent to PLOS One. Although I care more about the quality of the work than where it is published, I wouldn't refer to all journals published by NPG as "Nature".
Combining gene mutation with gene expression data improves outcome prediction in myelodysplastic syndromes
Check this article also. Completely done using publcially available microarray data, published in nature communications. I would highly recommend to check their supplement reproducible code. Beautiful R code on regression models and plotting, compiled with kntr!
Sorry... What is the purpose of this post? I mean, interesting paper but... Why posting it on Biostars?
It nicely shows how one can use public available data, that were created to answer completely different questions, for a completely new analysis. And these results can be published in nature. I think that these are good news for bioinformaticians, who pretty often think they can only work with wet-labs and expensive sequencing runs.
Fair enough, but the I think the whole encode, 1000 genomes, blueprint, etc have been produced and made public in part with the idea of enabling other researchers to mine these data. There are a lot papars using these data, so I'm not sure this paper is any special in this respect.
I did not claim that it is any special in this respect. It is just an example. I can delete the post, if you feel better then. I am not in the mood for this discussion, sorry. It is a new year and I do not want to spam anyone.
I know the people who wrote it and they were proud of the fact that they could publish it that high without expensive experiments. So I thought it might be worth to share this experience.
Sorry... I was just trying to understand...
You don't have to be sorry. I got your point and changed the title. I just don't want to make a mountain out of a molehill.
I like discussions, but sometimes it is just not worth it. ;)
I had the same question as dariober, but your answer makes sense. Perhaps including that in the top post clarifies quite a bit.
Well, good point. I changed the title.