Getting The Alleles Of Specific Snps For 37:Grch37
2
5
Entering edit mode
14.2 years ago
Emma ▴ 140

Hi all,

I am trying to get the alleles and frequences of some SNPs (from across the genome) for the assembly 37:GRCh37 (positive strand). I thought the easiest way would be to download the frequency data from hapmap and then look for my SNPs, but they are only have data for up to build 36. I also tried to send a batch query at the ncbi data but they dont support files as large as mine(I have to cut it in chunks) and they return far too much information than I need (genotypes for all submitted data, all existing populations, etc). Im only interested in CEU and the frequences from HapMap are more than good enough for my purposes. Im thinking there must be an easier way to do it than the batch query. All ideas are welcome!

Thanks!

snp genome allele • 4.2k views
ADD COMMENT
4
Entering edit mode
14.2 years ago

You can try your search with BioMart, HapMart - BioMart based interface for data mining targeted at HapMap data. If you are new to BioMart you may start with this article and variety of documents to get started with BioMart including video tutorials are available here.

ADD COMMENT
2
Entering edit mode

Emma, Biomart has all the data that you need (i.e. SNP information mapped to GRCh37), plus an archive of past mappings. you may have incorrectly landed on one of these, but if you go to http://www.biomart.org/, select MartView, choose database "Ensembl Variation 59", and choose dataset "Homo Sapiens Variation (dbSNP131)" you will surely be working with up to date information.

ADD REPLY
0
Entering edit mode

Thanks Khader for the Biomart intro.

ADD REPLY
0
Entering edit mode

Thanks, this is a good link to keep in mind for future use. But for now Im afraid it has similar problems as downloading directly from the hapmap ftp, ie it only has release 27 data, not the build that I need.

ADD REPLY
0
Entering edit mode

What Jorge said !

ADD REPLY
2
Entering edit mode
14.2 years ago

You can cross the mysql data of hapmap CEU of the UCSC for hg18 and the positions of the SNP for hg19(build37) dbsnp131:

mysql -h  genome-mysql.cse.ucsc.edu -A -u genome -D hg19

> select S.* from hg18.hapmapSnpsCEU as H, hg19.snp131 as S
  where S.name=H.name limit 2;
*************************** 1. row ***************************
       bin: 1289
     chrom: chr7
chromStart: 92383887
  chromEnd: 92383888
      name: rs10
     score: 0
    strand: +
   refNCBI: A
   refUCSC: A
  observed: A/C
   molType: genomic
     class: single
     valid: by-cluster,by-frequency,by-submitter,by-hapmap,by-1000genomes
     avHet: 0.028124
   avHetSE: 0.115199
      func: intron
   locType: exact
    weight: 1
*************************** 2. row ***************************
       bin: 1553
     chrom: chr12
chromStart: 126890979
  chromEnd: 126890980
      name: rs1000000
     score: 0
    strand: -
   refNCBI: G
   refUCSC: G
  observed: C/T
   molType: genomic
     class: single
     valid: by-cluster,by-frequency,by-2hit-2allele,by-hapmap,by-1000genomes
     avHet: 0.308102
   avHetSE: 0.243155
      func: unknown
   locType: exact
    weight: 1
ADD COMMENT
1
Entering edit mode

Emma, if you had a local installation of the UCSC databases, the best way would be to load your rs## in a 3rd database and to join it with the others. With the following SQL query you can store the results in a file, sort the file on the rs name , sort your rs list and join the two files with unix-join.

ADD REPLY
0
Entering edit mode

I havent used mysql before so my question is probably naive. I have around 40,000 SNPs that I need the strand, observed alleles and frequences for. Can I upload/input the rs# that I need and output to a text? Thanks for the idea, looks like it's probably the way to go.

ADD REPLY

Login before adding your answer.

Traffic: 1803 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6