A global network of millions of genomes could be medicine's next great advance.
Availability: 1-2 years
Noah is a six-year-old suffering from a disorder without a name. This year, his physicians will begin sending his genetic information across the Internet to see if there's anyone, anywhere, in the world like him.
This is the technical implementation I think that they are talking about:
The Beacon project is a project to test the willingness of international sites to share genetic data in the simplest of all technical contexts. It is defined as a simple public web service that any institution can implement as a service. The service is designed merely to accept a query of the form "Do you have any genomes with an 'A' at position 100,735 on chromosome 3" (or similar data) and responds with one of "Yes" or "No." A site offering this service is called a "beacon".
So it just a federated query over multiple large genomics (+ phenotypes) data sets. Full genomes are not centralized, or moved, so privacy is less of a concern.
What is important to recognize IMHO is that more data does not actually mean better information. I for one think that we need better and longer reads before it is worth sequencing every genome out there.
We are still in the learning phases of what to do and how to do it. Lots of bad data could be worse than no data at all. It may divert resources from making long term progress to short term gains.
It's telling that they only get to the privacy issue in the antepenultimate paragraph. That's the reason I would say, no, this won't effectively happen in 2 years (I'll mention that this is in addition to what Istvan wrote). There's rapid evolution in what constitutes informed consent in terms of DNA testing (e.g., see the here) and I can't really see that evolution reaching any sort of steady state in the next year or so. If they want answers, it's faster to just contact a diagnostics company. I was at one about a year ago and they have a database like this of all of their samples that they use for filtering and querying. Since that's a private company with no data sharing the consent wording and legality becomes rather easier.
Some sort of walled garden approach is likely the only reasonable path for this sort of thing. The development of that is going to be tricky and time consuming, given the multiple iterations it's going to end up going through.
The day I met Haussler, he was wearing a faded Hawaiian shirt...
David Haussler is always wearing a Hawaiian shirt. Last week I saw a talk by him where he put a suit jacket over it, though. Heh.
It was a good talk about how we are going to have to represent all the individual genomes. How currently we have one path we call the "reference" genome, but any alternate paths are treated as "second class citizens" in the software. I think this is an area that those of us on the software end of this debate have to be very attentive about. We need more flexible tools, better visualizations, and huge capacity because of all of this.
Hi, any chance the talk is online in video form or summarized somewhere? I'd be interested in hearing more. Thanks!
I don't think it was recorded, but I'll check.