Why Is Snp Data Generally Restricted?
7
4
Entering edit mode
13.0 years ago
Dsimcha ▴ 60

Whenever I've looked at public data websites like Gene Expression Omnibus or The Cancer Genome Atlas, it seems like SNP datasets are restricted access. I vaguely understand that this is related to privacy concerns, since a SNP profile could theoretically uniquely identify a person. However, this seems ridiculous because to uniquely identify the person from SNP data, you'd need the person's genome or SNP profile. These are not things that can be obtained easily and covertly, or legally without consent. Furthermore, such a policy of burying SNP data in a layer of red tape and requiring a separate request to be filed for every specific use discourages exploratory research and data mining.

Why is there so much concern about what seems to be such a theoretical issue? Is there anywhere were large amounts of de-identified human SNP data are available for data mining purposes without layers of red tape?

EDIT: I'm mostly interested in case-control SNP data, which seems particularly hard to find.

snp data • 5.2k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
6
Entering edit mode
13.0 years ago

However, this seems ridiculous because to uniquely identify the person from SNP data, you'd need the person's genome or SNP profile. These are not things that can be obtained easily and covertly, or legally without consent.

This is true at the present, but may not be true in the future. Sequencing is getting cheaper and more accessibile by the day. If anonymous SNP data is made public today, it's public forever. Ten years from now, it may be trivial to get a DNA sample from a dropped hair or saliva left behind on a glass. Once you've sequenced that DNA, it'd be easy to match it up with a database of publicly released data and figure out what disease studies that person participated in. This is a breach of medical confidentiality and is generally not cool.

ADD COMMENT
5
Entering edit mode
13.0 years ago
User 59 13k

Have you had a look at the 1000 Genomes project? Or all the Complete Genomics public data? There's more SNP data out there on individuals than you can shake a stick at. You must be looking in the wrong places :)

GEO is the "Gene Expression Omnibus" and it whilst it may deal largely in expression data doesn't mean that SNP data isn't represented. Have a look at this, as an example. Just because CGAP attempts to summarise the data, I don't think that means it's burying it under some fear of litigation from a genotyped patient

ADD COMMENT
1
Entering edit mode

OK, I that is a more valid question. I've just sent off my 23andMe sample to be genotyped, and I'm currently thinking a lot about whether I release that publicly. Arguments for: I get to contribute a tiny smidgen of personal data to science. Arguments against: there is no UK equivalent of GINA (as far as I know) so releasing information that could be tied to me, or is explicitly tied to me has unknown future legal ramifications (not to be resolved until 2014). I fear it is not so much identification that concerns people, but discrimination.

ADD REPLY
0
Entering edit mode

Yeah, I was definitely looking in the wrong places then. To rephrase the question, why do some people treat SNP data as confidential, enough to confuse the issue?

ADD REPLY
0
Entering edit mode

Maybe this article be helpful for your latter question: http://science.sciencemag.org/content/305/5681/183

ADD REPLY
2
Entering edit mode
13.0 years ago

A short answer to the rephrased question (why do some people treat SNP data as confidential, enough to confuse the issue?) basically centers on IRB (Institutional Review Board) approval and consent of the individuals participating in the study. In order for researchers to gain access to either the subjects' DNA or their data, anonymity must be guaranteed. This is a key factor in our analysis of genotyping data from thousands of individuals.

ADD COMMENT
2
Entering edit mode
13.0 years ago
Mary 11k

There are a sometimes other issues: many times in the "big data" projects there are also constraints on using the data prior to publication. Sometimes that comes in the form of access requests to ensure that people are not breaking the "embargo" or "moratorium" period. There was a paper that was based on dbGaP that had to be retracted for breaking that a couple of years ago.

There was a paper published a few years ago that showed it was possible to re-identify individuals from SNP data, and at that time NHGRI pulled several data sets that had previously been public. I'll get that reference later and edit this.

I think this is the paper that caused the conflama: Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays

This blog post explains more of the story and some follow-up, and also links to the NHGRI announcement about pulling the data: Re-identification and its Discontents

Edit 3: and here's a link to the dbGap story: http://blog.openhelix.eu/?p=2565

ADD COMMENT
1
Entering edit mode
13.0 years ago
User 59 13k

And of course now we have another source: OpenSNP

ADD COMMENT
0
Entering edit mode
13.0 years ago
Shigeta ▴ 470

still there are even more individual SNP profiles available than even 1000 Genomes and Complete Genomics has to offer.

There was a Genome Wide Association Studies repository in preparation at NIH for experimentalists to deposit their data, much like GEO is for expression data.

dbGAP (Genotypes and Phenotypes)

http://www.ncbi.nlm.nih.gov/sites/entrez?db=gap

its still pretty new, but it will get huge as time goes on.

ADD COMMENT
0
Entering edit mode

I looked at dbGAP a while ago and my conclusion was that basically everything worth having was controlled access.

ADD REPLY
0
Entering edit mode

i can imagine- when the journals start requiring deposition (as they did for xray structures and expression data) you will start seeing more...

ADD REPLY
0
Entering edit mode
5.2 years ago
anneykim • 0

I've also experienced a lot of frustration getting access to SNP data. My research at MIT CSAIL with Professor Manolis Kellis has actually made a pretty neat tool for preserving data privacy and efficient clinical data access (arxiv paper: https://arxiv.org/abs/1811.01431 ). If you're interested in testing it out, please sign up :) https://secureailabs.com/

ADD COMMENT

Login before adding your answer.

Traffic: 2715 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6