Via BEDOPS tools and UCSC data, you can:
1) Get SNPs for your reference genome:
$ wget -qO- http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/snp142Common.txt.gz \
| gunzip -c - \
| cut -f2,3,4,5,10 - \
| awk -v OFS="\t" '{ print $1, $2, ($2 + 1), $4, $5 }' - \
| sort-bed - \
> hg19.snp142.bed
2) Get gene annotations; for example, from Gencode:
$ wget -qO- ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_19/gencode.v19.annotation.gff3.gz \
| gunzip -c - \
| gff2bed - \
| awk '($8="gene" && $4!~/^exon/)' - \
| cut -f1-6 - \
> hg19.gencode19.genes.bed
3) Do a bedmap
operation to map hg19 SNPs within 1kb of hg19 gene annotations:
$ bedmap --echo --echo-map-id-uniq --delim '\t' --range 1000 hg19.gencode19.genes.bed hg19.snp142.bed > answer.bed
The file answer.bed
contains genes and all rs* IDs of SNPs that fall within 1000 upstream or downstream of each gene interval.
Hello,
it is not clear to me what you are looking for. Something like the Variant Table provided by ensembl for every gene?
Please describe more detailed what you have and what you want to get.
fin swimmer