Clinically-Associated Snp'S
3
9
Entering edit mode
13.2 years ago
Vova Naumov ▴ 220

Hi! We are now trying to understand, what Illumina chip is better for medical condition testing. So I used this MySQL query to get list of clinically-associated SNP':

mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A  -D hg19 -e '
SELECT *
FROM
  snp132 s
WHERE
  s.bitfields LIKE 'clin%' '

So now I have a list of about 22000 rs and it is interesting what association is meant by the base. There was a question on Biostar (http://biostar.stackexchange.com/questions/1289/disease-associated-snps) that could help me, but since 16 july OMIM table is not more in genome database. And the question is how can I get a list of disases/conditions from this snp list?

snp disease database • 7.5k views
ADD COMMENT
3
Entering edit mode

hg18 does not have table snp132; I think you must have used hg19.

ADD REPLY
0
Entering edit mode

Sure, sorry, I'l change it

ADD REPLY
8
Entering edit mode
13.2 years ago

1) Register an access to the FTP site of omim: http://omim.org/downloads and download mim2gene:

$ curl -s  "ftp://anonymous:xxxxxxx@xxxxx.edu/OMIM/mim2gene.txt" | head
# Mim Number    Type    Gene IDs    Approved Gene Symbols
100050    phenotype    -    -
100070    phenotype    100329167    -
100100    phenotype    -    -
100200    phenotype    -    -
100300    phenotype    100188340    -
100500    moved/removed    -    -
100600    phenotype    -    -
100640    gene    216    ALDH1A1
100650    gene/phenotype    217    ALDH2

get a list of the gene symbols:

~$ curl -s  "ftp://anonymous:xxxxx@xxxxxx.edu/OMIM/mim2gene.txt" |\
   egrep -v "#" | cut -d '  ' -f 4 | egrep -v '^\-$' |\
   sort | uniq > list1.txt

2) get your list of SNP associiated to the gene symbol. Something like:

mysql -N --user=genome --host=genome-mysql.cse.ucsc.edu -A  -D hg19 -e 'select  distinct
  G.geneSymbol,
  S.name
from snp132 as S,
kgXref as G,
knownGene as K where
    S.chrom=K.chrom and
    S.chromStart>=K.txStart and
    S.chromEnd<=K.txEnd and
    K.name=G.kgId 
    /* AND something to restrict the result to YOUR list of SNPs or gene */
' | sort -t '    ' -k1,1 > list2.txt

3) use unix join to join the two lists:

join -1 1 -2 1 list1.txt list2.txt

you should get a list with two columns: the OMIM gene and your SNP.

ADD COMMENT
0
Entering edit mode

Thank you very much! Allways new that these unix commands are very useful. I also tried to use /OMIM/genemap file to get rs numbers from 12th column, but there wre only 209 common rs between clinically-associated and numbers from this file.

ADD REPLY
4
Entering edit mode
13.2 years ago

dbSNP includes clinically significant variations and you can now filter search results on clinical significance, allele origin, minor allele frequency, and suspected false SNPs. See http://www.ncbi.nlm.nih.gov/projects/SNP/docs/rs_attributes.html for more.

From http://www.ncbi.nlm.nih.gov/projects/SNP/docs/rs_attributes.html : Clinical significance: The significance of the indicated allele.

The supported values are:

unknown 
untested
non-pathogenic
probable-non-pathogenic
probable-pathogenic
pathogenic
drug-response
histocompatibility
other

In dbSNP build 132, there are 13105 such rs entries. While no good diefinition of "clinical significance" is given, the above examples of what NCBI classifies as such can help to form a picture of what is meant by this term.

Edit added 13 Oct 2011: I have just learned from following the International Congress of Human Genetics meeting on Twitter that Rong Chen is painstakingly manually curating 5,478 disease-SNP association papers and adding the info to a database of 67,678 SNPs associated with 1,563 diseases.

ADD COMMENT
2
Entering edit mode
13.2 years ago

What do you mean by clinical association ? What is your criteria ?

Mendelian disease, Complex disease, Pharmacogenomic variants or combination two or more ?*

If you are interested in combined dataset you need to do raw-data-munging. OMIM is ideal for Mendelian variants, for complex disease variants you should check GWAS resources, for Pharmacogenomics variants check PharmGKB.To identify cinically-associated variants from GWAS see my discussion 1, 2 and 3. For pharmacogenomics variants, see list of Annotated SNPs by Disease in PharmGKB here. A combination of the 3 resources will give you a complete coverage of SNPs for your study.

*I recently integrated such a data-set for a manuscript using the approach discussed above.

ADD COMMENT
1
Entering edit mode

I'm interested too what is meant in snp 132 under clinically-associated

ADD REPLY
1
Entering edit mode

@Vova: Please refer to Larry's answer !

ADD REPLY

Login before adding your answer.

Traffic: 2173 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6