Question

how to find SNP positions (for non-bioinformaticians)

1

Entering edit mode

9.9 years ago

CrazyB ▴ 280

First off, I apologize for posting this "old" inquiry. I know similar inquiries were put out before, but I am hoping to find new solutions to this inquiry.

I am trying to find the positions of a list of SNPs (given the rs#). Need a "new" solution.

What I have tried so far -

(a) sending a batch query to dbSNP at NCBI, which worked well in the past, but today ~10 hr after sending the batch query, no return of result yet ( is the server down ??)

(b) downloading all dbSNP positions from Biomart and hoping to do some "intersection" to find the positions for specific rs#. The download somehow was terminated prematurely (first download took ~ 1+ hr).

(c) downloading cruzdb. cruzdb was suggested as a solution in one of the earlier posts. I read the document and still could not run it - my apology ! (does running cruzdb require an understanding of python? which I currently don't possess) Having to say it though, in contrast to cruzdb doc, I had better luck with vcftools and plink thanks to their "more friendly" documents.

Is there any other solutions that allow non-bioinformaticians to find answers to this task (i.e. positions for a list of SNPs)?

I certainly hope to get some useful responses, but It's understandable if the admin chooses to close this thread (due possibly to "duplication of questions"). Thank you

snp dbsnp BioMart • 8.0k views

ADD COMMENT • link updated 2.3 years ago by Ram 45k • written 9.9 years ago by CrazyB ▴ 280

0

Entering edit mode

How many rs# you have got? You can give UCSC table browser and give a list of rsIDs (< 1000) and select whatever information you need in the output file.

ADD REPLY • link updated 2.3 years ago by Ram 45k • written 9.9 years ago by Ashutosh Pandey 12k

0

Entering edit mode

Thanks. Will try UCSC table and see how it runs. I have only ~ 1000 rs, so wasn't sure why dbSNP database failed me yesterday.

ADD REPLY • link 9.9 years ago by CrazyB ▴ 280

4

Entering edit mode

9.9 years ago

Emily 24k

Don't download all the variants from BioMart whatever you do! There are >114M variants in human and BioMart cannot do that – that's why it's failing. Use the Variation database and filter by Variation name, then input your list of IDs.

ADD COMMENT • link 9.9 years ago by Emily 24k

2

Entering edit mode

9.0 years ago

Alex Reynolds 36k

You can use the mysql client to download a BED file containing the SNP position and rs* ID:

$ mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -N -D hg19 -e 'SELECT chrom, chromStart, chromEnd, name FROM snp144Common' > snp144Common.bed

This download took about 4-5 minutes to complete.

The complete schema for the snp144Common table is available from UCSC here — in the example above, we retrieve data for the chrom, chromStart, chromEnd and name fields. You can add other fields to the example command if they are useful to you, such as observed and func annotations, etc.

Once you have a list of SNPs, you can use awk to find the position of a single SNP, given the ID.

For example:

$ awk -v id='rs10409603' '{ if ($4 == id) { print $0; exit; } }' snp144Common.bed
chr19   8313572 8313573 rs10409603

If you have a list of IDs, you can use grep -F -f <filename> and pass in a file containing a list of IDs to do fixed-string (quick) searches against.

For example:

$ grep -F -f list-of-SNP-IDs.txt snp144Common.bed > answer.bed

Learning a few basics of doing things on the command line will pay massive dividends, in the long term.

ADD COMMENT • link 9.0 years ago by Alex Reynolds 36k

0

Entering edit mode

Could you tell what the difference between snp144.txt.gz and snp144Common? The former contains more than 130 million SNPs while snp144Common only contains 14760200 SNPs. Thank you very much!

ADD REPLY • link 5.8 years ago by yliueagle ▴ 290

0

Entering edit mode

I think this page answers my question: http://genome.ucsc.edu/goldenPath/newsarch.html Thank you!

ADD REPLY • link 5.8 years ago by yliueagle ▴ 290

0

Entering edit mode

9.2 years ago

Pierre Lindenbaum 166k

For non-bioinformatician: I would use Knime.org

Download and open the knime workbench (http://www.knime.org/downloads/overview), create a new workflow
Download snp from UCSC http://hgdownload.cse.ucsc.edu/goldenPath/hg38/database/snp141.txt.gz and open it in the workflow ('read File' node)
Load your list of SNP name using a 'Read file' node.
Use a 'join Node' to get the intersection on both previous node using the snp name.

ADD COMMENT • link updated 2.8 years ago by Ram 45k • written 9.2 years ago by Pierre Lindenbaum 166k

0

Entering edit mode

9.0 years ago

Ibrahim Tanyalcin ★ 1.2k

Dear,

I have created a software for myself a year ago for visualizing SNVs for a specific gene name. Whether you use a VCF file, or a variant file from Biomart, you can generate these graphs for a given gene. If your SNPs/SNVs are gene based, you can easily generate graphs like this:

http://i-pv.org/EGFR.html

http://i-pv.org/JAK2.html

Maybe it helps,

ADD COMMENT • link 9.0 years ago by Ibrahim Tanyalcin ★ 1.2k

Ram · Accepted Answer · 2015-06-03

Hi, as suggested by Ashutosh Pandey, you can exploit UCSC table browser. Select genome and assembly of interest and from group menu select Variation. From track menu select All SNPs(142). Paste or upload your rs ids using buttons at identifiers (names/accessions). To export your results, select selected fields from primary and related tables from output format then click get output. In the next step you can select fields of interest (i.e. input rds, chromosome, genomic position) that will be included in final output table, click get output to retrieve it.