While in his blog post I think Birney did a better job of qualifying his statements, there is simply no excuse for the way the 80% figure was handled in the actual publications and, more importantly, the press releases and commentary put out by ENCODE. I also wasn't very happy with the way Birney essentially dismissed the noise argument out of hand. We know that ENCODEs data is noisy and full of false positives, which is ok by ENCODEs job is to generate massive amounts of data to produce hypotheses that can be investigated in greater detail and combined with other data sources for different analyses. But that needs to be clear in their publications and it flat out isn't. Transcription factors will bind to random non-promoter sites. Those sites COULD become functional in the future, but currently aren't. Transcription is noisy, there are a lot of transcripts that get generated from pretty random portions of the genome. To call it functional is silly, plain and simple. I like the ENCODE project overall but I was pretty pissed with the sloppy usage of functional and the amount of false positives generated by the nature of their cutoffs and analyses.
I understand the issue with the false positives. But personally I'd rather have the data and be able to then filter for signal levels I think are appropriate. And it's not hard to ask for only signals over a certain value and proceed with your own analysis afterwards.
I totally agree. I have no problem with the ENCODE data, I think it is a very valuable resource that I use all of the time. What I didn't like was the hype and commentary in the published papers surrounding the data release.
My problem is that the ENCODE concept of function shows a completely failed understanding of biochemistry, evolution and genetics. We know that enzymes are not "perfect". We can reliably predict that DNA binding proteins will bind in useless ways, that RNA transcription will occur in useless places. We also know how selection works. Knowing this, we know that lots of RNA will be transcribed that has no selective advantage with the only disadvantage being energy consumed. And we know that the totality of RNA transcription is less than 1% of the energy costs of the cell. So spandral transcription of even 10% of all RNA transcribed would have such a minimal selective pressure as to make it almost impossible to evolve away. The same goes to added DNA length. This is not some blinding insight. We teach it to undergrads. Graduate students should be able to figure it out for themselves.
I expressed my opinion in a twitter conversation yesterday that this article is getting attention for its (imo unnecessary) tone and some admittedly good quips. Without those it's just an overview of the conversations amongst bioinformaticians (and much wider) that happened everywhere in September.
Thanks for the link - that makes for an interesting read.
My first instinct is it to disagree with the paper's criticism with regards of the hype and inconsistencies in communicating the summary of the ENCODE findings. I think that is just a byproduct of the way we produce/consume information in this age and we can't really fault any individual for that.
The paper also contains more objective and substantial criticism of the methodology and results. The validity of those still needs to be determined. I think that the conclusions of a project of this magnitude will be (or are already) treated as starting paradigm by a large number of life scientists. Therefore the burden of proof needs to be higher than for other papers - as it affects the direction an entire field of science takes.
While in his blog post I think Birney did a better job of qualifying his statements, there is simply no excuse for the way the 80% figure was handled in the actual publications and, more importantly, the press releases and commentary put out by ENCODE. I also wasn't very happy with the way Birney essentially dismissed the noise argument out of hand. We know that ENCODEs data is noisy and full of false positives, which is ok by ENCODEs job is to generate massive amounts of data to produce hypotheses that can be investigated in greater detail and combined with other data sources for different analyses. But that needs to be clear in their publications and it flat out isn't. Transcription factors will bind to random non-promoter sites. Those sites COULD become functional in the future, but currently aren't. Transcription is noisy, there are a lot of transcripts that get generated from pretty random portions of the genome. To call it functional is silly, plain and simple. I like the ENCODE project overall but I was pretty pissed with the sloppy usage of functional and the amount of false positives generated by the nature of their cutoffs and analyses.
I understand the issue with the false positives. But personally I'd rather have the data and be able to then filter for signal levels I think are appropriate. And it's not hard to ask for only signals over a certain value and proceed with your own analysis afterwards.
I totally agree. I have no problem with the ENCODE data, I think it is a very valuable resource that I use all of the time. What I didn't like was the hype and commentary in the published papers surrounding the data release.
My problem is that the ENCODE concept of function shows a completely failed understanding of biochemistry, evolution and genetics. We know that enzymes are not "perfect". We can reliably predict that DNA binding proteins will bind in useless ways, that RNA transcription will occur in useless places. We also know how selection works. Knowing this, we know that lots of RNA will be transcribed that has no selective advantage with the only disadvantage being energy consumed. And we know that the totality of RNA transcription is less than 1% of the energy costs of the cell. So spandral transcription of even 10% of all RNA transcribed would have such a minimal selective pressure as to make it almost impossible to evolve away. The same goes to added DNA length. This is not some blinding insight. We teach it to undergrads. Graduate students should be able to figure it out for themselves.