Question

How To Map A Snp To A Gene Around +/- 60Kb ?

22

Entering edit mode

15.0 years ago

Khader Shameer 18k

I am looking at a bunch of SNPs. Some of them are part of genes, but other are not. I am interested to look up +60KB or -60KB of those SNPs to get details about some nearby genes. Please share your experience in dealing with such a situation or thoughts on any methods that can do this. Thanks in advance.

snp data human annotation • 27k views

ADD COMMENT • link updated 15.0 years ago by Aaronquinlan 12k • written 15.0 years ago by Khader Shameer 18k

0

Entering edit mode

If you just want gene names within 60kb of the SNP:

$ bedmap --echo --echo-map-id-uniq --range 60000 <(vcf2bed snps.vcf) genes.bed

If you want their locations, also:

$ bedmap --echo --echo-map --range 60000 <(vcf2bed snps.vcf) genes.bed

These are operations to retrieve genes within 60kb of a SNP.

If you want the very nearest gene to a SNP, you can use closest-features:

$ closest-features --closest --dist <(vcf2bed snps.vcf) genes.bed > answer.bed

This gives you, for each SNP, its nearest gene and the distance value in bases.

These and other genomic set operation tools are part of the BEDOPS toolkit.

ADD REPLY • link 8.0 years ago by Alex Reynolds 36k

Ram · Answer 1 · 2010-03-22

26

Entering edit mode

15.0 years ago

Pierre Lindenbaum 165k

Using the UCSC mysql server:

~> mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19 -e '
select
 K.proteinID,
 K.name,
 S.name,
 S.avHet,
 S.chrom,
 S.chromStart,
 K.txStart,
 K.txEnd
from snp130 as S
left join knownGene as K on
 (S.chrom=K.chrom and not(K.txEnd+60000<S.chromStart or S.chromEnd+60000<K.txStart))
where
 S.name in ("rs25","rs100","rs75","rs9876","rs101")'

result:

+-----------+------------+--------+----------+-------+------------+----------+----------+
| proteinID | name       | name   | avHet    | chrom | chromStart | txStart  | txEnd    |
+-----------+------------+--------+----------+-------+------------+----------+----------+
| NULL      | NULL       | rs100  |        0 | chr7  |   24438348 |     NULL |     NULL |
| NULL      | NULL       | rs101  |        0 | chr7  |   24438147 |     NULL |     NULL |
| NP_056019 | uc003ssf.3 | rs25   | 0.499586 | chr7  |   11584141 | 11414172 | 11871824 |
| NP_056019 | uc003ssf.3 | rs75   | 0.241967 | chr7  |   11613691 | 11414172 | 11871824 |
| B2RNV1    | uc003tnv.2 | rs9876 | 0.426096 | chr7  |   47315290 | 47314752 | 47579199 |
| B2RNV1    | uc003tnw.2 | rs9876 | 0.426096 | chr7  |   47315290 | 47314752 | 47621742 |
+-----------+------------+--------+----------+-------+------------+----------+----------+

ADD COMMENT • link updated 6.6 years ago by Ram 45k • written 15.0 years ago by Pierre Lindenbaum 165k

4

Entering edit mode

Pierre, this is great solution. I completely forgot about mysql server. However, I'd recommend to use refFlat table instead of knownGene and K.geneName instead of K.proteinID. In this case you will get HUGO gene name. Anyway, it's up to OP.

ADD REPLY • link 15.0 years ago by Yuri ★ 1.7k

1

Entering edit mode

Very nice solution - anything that takes us away from ad hoc Perl scripts is good!

ADD REPLY • link 15.0 years ago by Mikael Huss 4.8k

1

Entering edit mode

Recently, a question similar to this question was asked in BioBB. URL : http://www.bioinformatics.org/pipermail/bbb/2010-April/004849.html

ADD REPLY • link updated 5.5 years ago by Ram 45k • written 15.0 years ago by Khader Shameer 18k

0

Entering edit mode

Excellent ! I was looking for something like this. Thanks a lot Pierre.

ADD REPLY • link 15.0 years ago by Khader Shameer 18k

0

Entering edit mode

Ditto. Very nice solution, I didn't even know that server existed. looks like they may be getting a bit more traffic in the future.

ADD REPLY • link 15.0 years ago by Ian Simpson ▴ 960

0

Entering edit mode

I wanted to limit my results to items with a geneSymbol so I added an extra join:

select S.name as rsId, kgXref.geneSymbol from snp131 as S left join knownGene as K on (S.chrom=K.chrom and not(K.txEnd+60000

ADD REPLY • link 14.0 years ago by Petervermont • 0

0

Entering edit mode

I found the UCSC server unusably slow. I used the UCSC Table browser to export tab delimited files. Bu picking 'selected fields from primary and related tables' I was able to just get the fields I wanted. I then used Toad for MySQL (Freeware on Windows) to import the flat file to a local db in a single step.

ADD REPLY • link 14.0 years ago by Petervermont • 0

0

Entering edit mode

I found the UCSC server unusably slow. I used the UCSC Table browser to export tab delimited files for the snp131 and refFlat (per Yuri above) tables. By choosing 'selected fields from primary and related tables' from drop down menu I was able to get only the the fields I wanted, thereby saving time and disk space. I then used Toad for MySQL (Freeware on Windows) to import the flat files to a local db in a single step each.

ADD REPLY • link 14.0 years ago by Petervermont • 0

0

Entering edit mode

quite old but still - doesn't "not(K.txEnd+60000<S.chromStart" also mean "all genes that end more than 60kb away from snp" to the right?

ADD REPLY • link 13.5 years ago by Dima Lvov ▴ 40

0

Entering edit mode

'not(K.txEnd+60000<S.chromStart or S.chromEnd+60000<K.txStart)' means if the gene end +60KB is before the SNP or the gene start - 60Kb if after the SNP the DO NOT accept the SNP.

ADD REPLY • link 13.5 years ago by Pierre Lindenbaum 165k

0

Entering edit mode

Sorry, could you please tell me exactly what I have to type on the command line? If I copy this mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A in MySQL 5.7 Command Line Client I get an error in SQL syntax...thank you

ADD REPLY • link updated 5.6 years ago by Ram 45k • written 9.2 years ago by viola.grugni • 0

0

Entering edit mode

2019-11-20: the sql server has changed to mysql --user=genome --host=genome-mysql.soe.ucsc.edu -A -P 3306

ADD REPLY • link 5.4 years ago by Pierre Lindenbaum 165k

Ram · Answer 2 · 2010-07-19

8

Entering edit mode

14.7 years ago

Aaronquinlan 12k

I imagine you have this sorted out by now, but I thought I'd add that assuming your SNPs are in BED/ROD or VCF format and your genes are in BED12 or GFF format, then you can easily do this with a single command in my BEDTools suite. The relevant command would be:

windowBed -a snps.bed -b genes.bed -w 60000 > snps.withGenesWithin60Kb.bed

ADD COMMENT • link updated 6.6 years ago by Ram 45k • written 14.7 years ago by Aaronquinlan 12k

0

Entering edit mode

Thanks for adding your answer. Interesting to note the BEDTools. I will go through it.

ADD REPLY • link 14.7 years ago by Khader Shameer 18k

0

Entering edit mode

Hi Khader, Just wondering if you managed to try it with BEDTools? I'm trying to do a similar thing but not very familiar with mySQL. Also, may I just ask on what basis/why you chose a +/- 60kb window size? Thanks!

ADD REPLY • link 9.1 years ago by noor.hidayatul • 0

score 5 · Answer 3 · 2010-03-22

5

Entering edit mode

15.0 years ago

Michael 55k

No need to program this. This can be done as a BioMart query in MartView:

rs8 with 60000 bases up/donw-flanks

I just checked it works with that size flanks. The result is a FASTA file.

On the other hand, if you have a SNP position it could be easier to simply search for the genes directly in the genome annotation. Get a genome annotation as a tab separated file from e.g. BioMart and search for the genes with start/stop positons within this range.

ADD COMMENT • link 15.0 years ago by Michael 55k

0

Entering edit mode

Thanks Michael.

ADD REPLY • link 15.0 years ago by Khader Shameer 18k

0

Entering edit mode

I am afraid this is not exactly I am looking for. I wanted to check if a particular SNP is not mapped to a gene, whether there will be any known genes in +/- 60KB of that SNP. This particular BioMart query is fetching the sequence not annotation. I have tried one with annotation and it works. : http://www.biomart.org/biomart/martview/c26efcafd362abccc4af83039c441762. But sometimes BioMart out put requires lot of post-processing. Also it will be nice if BioMart can have options to select fields multiple attributes.

ADD REPLY • link 15.0 years ago by Khader Shameer 18k

0

Entering edit mode

Yes, sorry, I misunderstood your question. What Pierre described is much better here.

ADD REPLY • link 15.0 years ago by Michael 55k

score 2 · Answer 4 · 2010-03-22

2

Entering edit mode

15.0 years ago

Yuri ★ 1.7k

I downloaded refFlat table from UCSC Genome Browser, which contains gene symbols and their genomic locations. Then mapped SNPs to genes. I did it in MATLAB, but can be done with any language.

You can also try a Perl script from here.

ADD COMMENT • link 15.0 years ago by Yuri ★ 1.7k