Hi,
where can I download snps that are within 1Mb from gene? I have a list of genes and need to find the snps within 1Mb for each gene
Hi,
where can I download snps that are within 1Mb from gene? I have a list of genes and need to find the snps within 1Mb for each gene
Assuming you have the positions of your genes in a sorted BED file, you can skip the first step of downloading genes and filtering them for those of interest.
Download a list of genes:
$ wget -qO- ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_26/gencode.v26.basic.annotation.gff3.gz \
| gunzip -c \
| convert2bed --input=gff - \
> gencode.v26.bed
Then grep
for your genes of interest (names should match Gencode gene names, ENSG*
etc.):
$ LC_ALL=C && grep -F -f genes-of-interest.txt gencode.v26.bed > gencode.v26.goi.bed
Once you have genes of interest and their positions in a sorted BED file, you can grab the SNPs and put those into BED format:
$ wget -qO- http://hgdownload.cse.ucsc.edu/goldenpath/hg38/database/snp147.txt.gz \
| gunzip -c \
| awk -v OFS="\t" '{ print $2,$3,($3+1),$5 }' \
| sort-bed - \
> hg38.snp147.bed
Now you have genes, SNPs, and their positions. You can map SNPs to genes within a 1M window:
$ bedmap --echo --echo-map-id --range 1000000 --delim '\t' gencode.v26.goi.bed hg38.snp147.bed > answer.bed
The file answer.bed
will contain the gene records on each line. The last column of each line will be a tab-delimited list of SNP IDs, which overlap the gene within 1Mb.
Those look like HGNC names. If you're working with hg38
:
$ wget -qO- http://hgdownload.cse.ucsc.edu/goldenpath/hg38/database/refGene.txt.gz \
| gunzip -c \
| awk -v OFS="\t" '($9>1){ print $3,$5,$6,$13,$9,$4 }' \
| sort-bed - \
> hg38.hgnc.bed
Then grep
as before:
$ LC_ALL=C && grep -F -f genes-of-interest.txt hg38.hgnc.bed > hg38.hgnc.goi.bed
The rest is the same.
Hi, You can use BiomaRt package to grab this information
library(biomaRt)
listMarts()
snps<-useMart("ENSEMBL_MART_SNP")
listDatasets(snps)
snps<-useDataset("hsapiens_snp",mart = snps)
listFilters(snps)
snps_info<-getBM(filters="chromosomal_region",attributes=c("refsnp_id","chr_name","allele","minor_allele","minor_allele_freq","clinical_significance","polyphen_prediction","polyphen_score","sift_prediction","sift_score"),values=coordinates,mart=snps)
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
actually I applied your code it gave me one vector of snps while I need for each gene the snps that is within 1mb
Please use
ADD COMMENT/ADD REPLY
to properly place this comment in the correct answer thread below. This helps keep threads logically organized.Hello mms140130!
Questions similar to yours can already be found at:
We have closed your question to allow us to keep similar content in the same thread.
If you disagree with this please tell us why in a reply below. We'll be happy to talk about it.
Cheers!
I don't think it is the same thing