how to find snps within 1Kb from the gene
1
0
Entering edit mode
7.1 years ago
mms140130 ▴ 60

Hello,

I have asked similar question but the answer was complicated, is there an easy way to download such data, I used to use Scandb , which I download the data from the website but I found that they use different snps annotation than what I have , mine is hg19. so the position is different

Is there a website where I can download such data in .txt file

Thank you,

snps gene • 2.5k views
ADD COMMENT
0
Entering edit mode

Hello,

it is not clear to me what you are looking for. Something like the Variant Table provided by ensembl for every gene?

Please describe more detailed what you have and what you want to get.

fin swimmer

ADD REPLY
2
Entering edit mode
7.1 years ago

Via BEDOPS tools and UCSC data, you can:

1) Get SNPs for your reference genome:

$ wget -qO- http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/snp142Common.txt.gz \
    | gunzip -c - \
    | cut -f2,3,4,5,10 - \
    | awk -v OFS="\t" '{ print $1, $2, ($2 + 1), $4, $5 }' - \
    | sort-bed - \
    > hg19.snp142.bed

2) Get gene annotations; for example, from Gencode:

$ wget -qO- ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_19/gencode.v19.annotation.gff3.gz \
    | gunzip -c - \
    | gff2bed - \
    | awk '($8="gene" && $4!~/^exon/)' - \
    | cut -f1-6 - \
    > hg19.gencode19.genes.bed

3) Do a bedmap operation to map hg19 SNPs within 1kb of hg19 gene annotations:

$ bedmap --echo --echo-map-id-uniq --delim '\t' --range 1000 hg19.gencode19.genes.bed hg19.snp142.bed > answer.bed

The file answer.bed contains genes and all rs* IDs of SNPs that fall within 1000 upstream or downstream of each gene interval.

ADD COMMENT
0
Entering edit mode

is the gene annotation is for hg19 ?

ADD REPLY
0
Entering edit mode

Yes, it is for hg19. You can visit the Gencode site to confirm.

ADD REPLY
0
Entering edit mode

where can I download sort-bed ?

ADD REPLY
0
Entering edit mode

It is part of the BEDOPS toolkit I linked to. You can go to that link and read instructions on where to get it and how to install it.

ADD REPLY
0
Entering edit mode

I got the following error from the below command

wget -qO- http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/snp142Common.txt.gz \
    | gunzip -c - \
    | cut -f2,3,4,5,10 - \
    | sort-bed - \
    > hg19.snp142.bed

Error on line 1 in -. Genomic end coordinate is less than (or equal to) start coordinate

ADD REPLY
0
Entering edit mode

See modified answer. I added one to each SNP position to that it conforms to how BED coordinates are represented.

ADD REPLY
0
Entering edit mode

Thank you Alex, can I ask a question is this solution will give the same output as in C: Download snps within 1Mb from gene

but changing the 1 mb to 1 kb ?

your solution here gave me small number of snps per gene, in the C: Download snps within 1Mb from gene

I had more snps per gene and i applied 1kb for both solutions, what is the difference?

which code to trust

ADD REPLY
0
Entering edit mode

It looks like you asked that question, too. The options specified are different, the reference genomes are different and so the inputs are different, and the two answers use different ranges. I would say you should review the links to documentation to understand the options, and review the inputs to verify them, and then decide what answer you trust, if any, based on what options you're using and what inputs you're providing.

ADD REPLY
0
Entering edit mode

Thanks for your comment it helped

Can ask you what is the difference between snp 147.txt and 142common .txt

which one should I choose ?

ADD REPLY

Login before adding your answer.

Traffic: 2918 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6