Hi there
I have sequenced the exomes of 4 of members of my family, myself included, done the SNP chip with 23andMe and sequenced my personal shitome (or gut microbiota if you will). Because I was doing this with private money and efforts, initially myself and my whole family afterwards, the process has gone through many stages. My exome sequencing I did it in 2011. It cost me $999 which I raised privately with my family support. The raw fastq data are available through figShare:
http://figshare.com/articles/Son_exome_files/92584
The sequencing was done with Illumina HiSeq. As soon as I could, I made these data available through my blog for people to download. I made sure it had a public domain license. The only thing I asked people was that, if they so wished, they feed me back all the interesting findings they encounter. Here is the blog entry:
http://manuelcorpas.com/2012/01/23/my-personal-exome-now-publicly-released/
The response was so overwhelming, that I even got companies like Oxford Gene Technology (OGT) who analysed my exome for free, as they were developing their own internal pipeline. They gave me back the SNPs derived from my exome using GATK and a report which I published in my blog. Looking at the summary metrics in OGT’s report, my personal exome produced:
- 30,702 variations to the reference genome (GRCh37)
- 5,565 non synonymous coding variations with consequences
- A minimum of 61.42% of the on-target regions, covered with a depth of at least 20x
- A total of 2.54 Gigabases of sequence data read and aligned at high quality.
These data are available here (again, in the public domain):
http://figshare.com/articles/SNPs_from_the_Son's_exome_data/92819
I felt that this opportunity to crowdsource the analysis by publishing the data was really effective. I talked with my family through that and we decided as a family to publish and release all our data (with their informed consent). I reported how this process went in a publication in the Journal of Genetic Counseling, "A family experience of personal genomics":
http://link.springer.com/article/10.1007%2Fs10897-011-9473-7
After that, and given that for my family we still did not have sequencing data yet, we started a crowdfunding campaign. The object of this campaign was to raise money to sequence the whole genomes of 5 family members. This cost at the time $20,000 (early 2012). I was able to raise >$3,000, which by some "experts" seemed a failure. This story was picked up by the Science journal:
http://news.sciencemag.org/2012/08/keeping-it-family
With the money raised we were able to sequence 3 exomes of 3 members of my family, my mum, dad and sister. These data were made public as soon as they were received and uploaded them for people to use, in the hope that companies, users and anyone around would help us find interesting things. Again, this was picked up by several companies, interested to test their pipelines or simply wanting to raise awareness of their products. Notably, InSilicoDB did the BWA alignment and GATK variant calling, also including my exome. These data are readily available for analysis through InSilicoDB:
https://insilicodb.org/app/browse?q=ISDB11122
In there you will also find the metagenomics data from my own poo, just in case you are also interested to know. After my initial call for crowdsourcing the analysis of my personal fecal DNA, I had some results sent back via Twitter from Willy Valdivia at Orion Biosciences. Willy kindly sent me a couple of figures with a histogram of the percentage of DNA for top 25 organisms for my fecal sample:
http://manuelcorpas.com/2013/06/10/a-glimpse-of-what-my-fecal-dna-contains/
The VCF file with the variants for the 4 exomes is available here:
http://figshare.com/articles/VCF_file_from_Son/803101
Some people have talked about this sources of data as "The Corpasome" and recently I published an article that described our crowdsourcing efforts of the data available at the time. By then it had become clear how useful this approach could be for understanding the genomic information contained in our personal genomics tests.
http://www.scfbm.org/content/8/1/13
I have produced a video where I explain where this adventure has taken us so far:
http://www.youtube.com/watch?feature=player_detailpage&v=xANVOo0oR04
Finally, I would like to thank Albert Vilella for bringing this BioStar thread to my attention.
BGI has agreed to do it for one family: http://manuelcorpas.com/2012/06/21/crowdfunding-genome-project-day-2-bgi-officially-agrees-sequencing/
Not worried about conflict of interest?
Do you mean if I sequence at an academic institution? Or between me and insurance companies?
with academic institution.
I certainly would be. That is a large reason I asked this question, to determine how other bioinformaticians go about doing this without utilizing the resources at their fingertips
I am really curious about the answers as I already spent some time thinking about this:) I think that the main problem is that DNA extraction and library preparation is not something you can do at home and that even if you are ready to pay all the expenses and send it to sequencing company on your own costs, university administration is not prepared for this. They do not have way how to take your money for this even if you would love to pay them (I would be happy to find out I am wrong about this). And according to insurance companies - why would you report _anything_ to them? With all the possible errors and uncertainties when analyzing such data - could you really be convinced that you are prone to some serious disease? Could insurance company be? Also, I consider DNA to be of purely personal information about you.
The way I did the sequencing was to send saliva samples to the BGI using Oragene kits. It took a little bit of convincing from the BGI because at the time they would only do the sequencing from blood samples, which meant it had to be sent with ice and in very special conditions, and how was I going to extract my blood? Unless I did it in Spain with a friend clinician, extracting blood in the UK with no contacts for a purpose like this was completely unthinkable.
I am very grateful to Michelle Mao, the UK sales rep from BGI at the time, who did the negotiations with the BGI and allowed me to do the sequencing using Oragene kits. The BGI did the DNA extraction and the libraries for free.
At that time Oragene could send you 4 free trial test tubes (you should check their website, perhaps they still offer this). Unfortunately by the time I found the money to sequence my family, the DNA stabilising suspension for the Oragene kit had expired, so I asked some friends at the Sanger Institute if they could spare some spit tubes (they were running an internal trial with staff from the Genome Campus to sequence some SNP markers).
Luckily Jeff Barrett and Lizzy Langley at the Sanger let me have a few kits. She kindly posted them to my home door! With Oragene it was really easy, we just spit in the test tubes and sent them to the sequencing company, in our case Hong Kong, which is where everything is sent for the BGI. I was able to send my samples with DHL and I think it cost me ~£80 from the UK.
The picture above shows my hand and the Oragene spit test tubes with my mum, dad and sister's saliva ready to be sent.
The good thing about Oragene kits is that, unlike blood, they do not need to be kept in ice; as soon as you close the tube, a suspension is mixed that stabilises the DNA. So it is clean and easy to transport. Sending my poo sample however, was a totally different story. I can talk about that story in another post if you will ;-)
Thanks for the thoughts. I completely agree about not reporting to the insurance company! However, there are some legal implications that will need to be addressed directly in the future. For instance, if you smoke, you are legally required to report this to insurance companies, otherwise you may forfeit coverage due to failing to report your accurate medical history. I would keep my sequence completely private, but I wonder how long that will remain a possibility.
Assuming you're in the US, aren't the health insurance ramifications covered by the Genetic Information Nondiscrimination Act?
Good point! I certainly hope there won't be loopholes to take advantage of. I think back to the Curry case in 2005-2006: http://sports.espn.go.com/nba/news/story?id=2180298
Yeah, the devil is going to be very much in the details (or ambiguously written sub-sub-sub-clause)! I should note that GINA is only supposed to be for health insurance. Life insurance and such could still reject you (assuming you told them anything...).