Download snps within 1Mb from gene
2
0
Entering edit mode
7.2 years ago
mms140130 ▴ 60

Hi,

where can I download snps that are within 1Mb from gene? I have a list of genes and need to find the snps within 1Mb for each gene

gene SNP • 4.2k views
ADD COMMENT
0
Entering edit mode

actually I applied your code it gave me one vector of snps while I need for each gene the snps that is within 1mb

ADD REPLY
0
Entering edit mode

Please use ADD COMMENT/ADD REPLY to properly place this comment in the correct answer thread below. This helps keep threads logically organized.

ADD REPLY
0
Entering edit mode

Hello mms140130!

Questions similar to yours can already be found at:

We have closed your question to allow us to keep similar content in the same thread.

If you disagree with this please tell us why in a reply below. We'll be happy to talk about it.

Cheers!

ADD REPLY
0
Entering edit mode

I don't think it is the same thing

ADD REPLY
3
Entering edit mode
7.2 years ago

Assuming you have the positions of your genes in a sorted BED file, you can skip the first step of downloading genes and filtering them for those of interest.

Download a list of genes:

$ wget -qO- ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_26/gencode.v26.basic.annotation.gff3.gz \
    | gunzip -c \
    | convert2bed --input=gff - \
    > gencode.v26.bed

Then grep for your genes of interest (names should match Gencode gene names, ENSG* etc.):

$ LC_ALL=C && grep -F -f genes-of-interest.txt gencode.v26.bed > gencode.v26.goi.bed

Once you have genes of interest and their positions in a sorted BED file, you can grab the SNPs and put those into BED format:

$ wget -qO- http://hgdownload.cse.ucsc.edu/goldenpath/hg38/database/snp147.txt.gz \
   | gunzip -c \
   | awk -v OFS="\t" '{ print $2,$3,($3+1),$5 }' \
   | sort-bed - \
   > hg38.snp147.bed

Now you have genes, SNPs, and their positions. You can map SNPs to genes within a 1M window:

$ bedmap --echo --echo-map-id --range 1000000 --delim '\t' gencode.v26.goi.bed hg38.snp147.bed > answer.bed

The file answer.bed will contain the gene records on each line. The last column of each line will be a tab-delimited list of SNP IDs, which overlap the gene within 1Mb.

ADD COMMENT
0
Entering edit mode

is this Plink or what ?

which software is this?

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

how to make the code read my genename.txt ?

upload it ? they are like the following , is this the names match Gencode gene names?

A1BG
A1CF
A2M
A4GNT
AAAS
AADAC
AADACL2
AADAT
AAK1
AAMP
ADD REPLY
1
Entering edit mode

Those look like HGNC names. If you're working with hg38:

$ wget -qO- http://hgdownload.cse.ucsc.edu/goldenpath/hg38/database/refGene.txt.gz \
    | gunzip -c \
    | awk -v OFS="\t" '($9>1){ print $3,$5,$6,$13,$9,$4 }' \
    | sort-bed - \
    > hg38.hgnc.bed

Then grep as before:

$ LC_ALL=C && grep -F -f genes-of-interest.txt hg38.hgnc.bed > hg38.hgnc.goi.bed

The rest is the same.

ADD REPLY
0
Entering edit mode

ok good but how to input the gene-of-interest .txt it give me no No such file or directory

ADD REPLY
0
Entering edit mode

I don't understand your question. You already have a text file containing the genes you are interested in, correct? Your original question says you have a list of genes.

ADD REPLY
0
Entering edit mode

I have the file I'm using terminal in mac to excuse the code , when I excuse your code it tells me no such file or directory

I have my file on the desktop

ADD REPLY
0
Entering edit mode

I just added the location of the file next to the file name is this correct

ADD REPLY
0
Entering edit mode

I'm afraid I don't understand. Check that you have the files listed in these commands, perhaps?

ADD REPLY
0
Entering edit mode

No problem, I figured it out

by the way the answer.bed file how can I download it to my desk top on my laptop ?

ADD REPLY
1
Entering edit mode

Hi mms140130,

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.
Upvote|Bookmark|Accept

ADD REPLY
0
Entering edit mode

Hi Alex, is it possible to change the .bed file to .csv ? or .txt ?

ADD REPLY
0
Entering edit mode

After you have made a final BED file, you could convert it to CSV with something like:

$ cat in.bed | tr '\t' ',' > in.csv

Otherwise, it is already a (tab-delimited) text file. If you need to rename it from some-file.bed to some-file.txt, use the mv command.

ADD REPLY
0
Entering edit mode
7.2 years ago
tarek.mohamed ▴ 370

Hi, You can use BiomaRt package to grab this information

library(biomaRt)

 listMarts()

snps<-useMart("ENSEMBL_MART_SNP") 

listDatasets(snps)

Then you choose whatever data-set you are interested in, lets say "hsapiens_snp" which includes Human Short Variants (SNPs and indels excluding flagged variants) (GRCh38.p10).

 snps<-useDataset("hsapiens_snp",mart = snps)
  listFilters(snps)

Then you can choose "chromosaomal_region" filter, to get the all information you need about SNPs in that region.

P.S. you have to get the coordinates for your genes first, then put this coordinates in an object (name it for example coordinates)

snps_info<-getBM(filters="chromosomal_region",attributes=c("refsnp_id","chr_name","allele","minor_allele","minor_allele_freq","clinical_significance","polyphen_prediction","polyphen_score","sift_prediction","sift_score"),values=coordinates,mart=snps)
ADD COMMENT
0
Entering edit mode

OK , and how to specify the 1Mb from the gene ?

ADD REPLY
0
Entering edit mode

If you have the coordinates (chromosomal region) of your genes, you can just add 1Mb to the start and the end coordinates.

ADD REPLY
0
Entering edit mode

actually I applied your code it gave me one vector of snps while I need for each gene the snps that is within 1mb

ADD REPLY

Login before adding your answer.

Traffic: 3038 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6