Proud to come here to BioStars to announce that we're offering 30x whole genome sequencing for $3,490! This seems like the kind of place where like minded people have also really, really, really wanted to get their own genome sequenced. Well, at least me and my founder have really wanted to do it. Recode just ran a story on us!
Your approach of storing only the variants is pretty clever. But what's your contingency plan for when the reference you're using becomes obsoleted? There are plenty of options--I'm just curious to see your thinking.
Are you really planning to ONLY store variants? I don't think is a good idea. While people have had the dream of a graph based representation like you describe for years I have yet to see a convincing and practical solution for it (despite many smart people working on it). It is harder than it seems. But, even more importantly, by not keeping the raw reads you lose the ability to take advantage of new alignment and variant calling methods. The cost of storing a 30X whole genome worth of raw unaligned reads (~80GB compressed) is relatively negligible.
Of course we will keep the bem files in "cold storage" as static files. But, the system to live query them has to be far more efficient and that's what will be variant based.
The initial version will probably just be layers of variants on the reference, but I'm hoping to quickly replace that as I think it leads to unnecessary bias in the way we interpret the data (oh, this is *not normal*).
My plan is to implement a full graph approach that has no real reference, and each genome is just a path through a the graph of the whole of our stored genomes.
Actually, that was part of the issue... we couldn't find a lab that would work with 2 samples! They needed us to do the legal legwork, etc. Plus, sample collection was an issue. We're hoping to get the "group discount" and pay into the bulk purchase ourselves.
Personal genome sequencing can be framed in terms of "helping science" or "learning about yourself" but it is undeniable that the vast majority of the information related to genomes target human health and other related information. So it may be immaterial how one labels the product if it can mostly be used in one way.
It is a bit like selling a product with implications for health that have not been actually verified. There are plenty of such products and it seems the only requirement is to label them as "This product has not be approved by FDA". I wonder if any or all genome sequences or analyses should be labeled as such and if that would in fact satisfy regulatory bodies. These are uncharted waters.
After all just as Uber disrupts a heavily regulated taxi industry would it be possible that a bioinformaticians could disrupt medical industry? Or that an answer on Biostars on interpreting data could be constructed as essential component of a diagnosis? Stranger things have happened.
Unfortunately, due to the need to rebrand (how did we NOT know there is a thing called genohub) and the slow pace of sign ups, we've decided to suspend our Kickstarter campaign.
We're figuring out what to do next and if we just build the platform for our own usage and see if we can figure out a way to fund it later.
Thanks for the support on this board. We'll be around!
On Genohub I already see 2 packages offering 35x coverage on HiSeqX10 for $1800 (1 commercial in Seoul, one academic in Australia). I think with the rollout of the X10 in 1/2 a year it should not be a problem finding similar packages reliably. (Maybe these 2 now might be just to fill low capacity). And with bulk it should be even easier to find labs with a package at this price.
Right, and basically that's what we're using. Except, there are issues with doing this for individuals instead of businesses or setup labs. First, labs will not provide any method for collection. It's typically up to you to go get blood drawn or attempt to get a sales rep at a supplier to sell you a collection kit. Also, it's then up for you to do international shipping of the samples with customs clearance, etc. And, finally, most of these labs then require you to mail a hard drive along with the sample and then they mail that back to you for an extra fee.
It's also unlikely that most of these labs will offer to individuals at all. Why? Huge compliance and regulatory risk. Being a lab that processes samples is one thing, offering a 'consumer' or 'hobbyist' kit is something else entirely and brings with it a whole boat load of lawyering required (something we're already doing).
If you work in a lab now and are fine negotiating contracts with different suppliers and are used to mailing biological samples, etc... then absolutely this is doable for you.
Also, my search did not bring up that price on genohub. The lowest I saw for a single 30x whole human genome was $3,050.
We're basically doing all that leg work for the difference in price and are buying the kits and handling all the paperwork and bs.
ADD REPLY
• link
updated 2.9 years ago by
Ram
44k
•
written 10.2 years ago by
hcatlin
▴
100
Your approach of storing only the variants is pretty clever. But what's your contingency plan for when the reference you're using becomes obsoleted? There are plenty of options--I'm just curious to see your thinking.
Are you really planning to ONLY store variants? I don't think is a good idea. While people have had the dream of a graph based representation like you describe for years I have yet to see a convincing and practical solution for it (despite many smart people working on it). It is harder than it seems. But, even more importantly, by not keeping the raw reads you lose the ability to take advantage of new alignment and variant calling methods. The cost of storing a 30X whole genome worth of raw unaligned reads (~80GB compressed) is relatively negligible.
Of course we will keep the bem files in "cold storage" as static files. But, the system to live query them has to be far more efficient and that's what will be variant based.
The initial version will probably just be layers of variants on the reference, but I'm hoping to quickly replace that as I think it leads to unnecessary bias in the way we interpret the data (oh, this is *not normal*).
My plan is to implement a full graph approach that has no real reference, and each genome is just a path through a the graph of the whole of our stored genomes.
I like it! Thanks for the explanation.
I think you probably should have done some trial runs and cooked up some demos using your own sequence before you launched this Kickstarter.
Actually, that was part of the issue... we couldn't find a lab that would work with 2 samples! They needed us to do the legal legwork, etc. Plus, sample collection was an issue. We're hoping to get the "group discount" and pay into the bulk purchase ourselves.