Hi all,
I want to ask if there is a database that storing the human disease-related SNPs. I want to acquire those SNP located in gene promoter regions. Can anyone help this.
Thanks very much.
Cam
Hi all,
I want to ask if there is a database that storing the human disease-related SNPs. I want to acquire those SNP located in gene promoter regions. Can anyone help this.
Thanks very much.
Cam
Say you're working with hg19.
Grab SNP entries from NCBI and convert them to sorted BED with vcf2bed
:
$ wget -qO- ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b151_GRCh37p13/VCF/common_all_20180423.vcf.gz \
| gunzip -c - \
| convert2bed --input=vcf --output=bed --sort-tmpdir=${PWD} - \
> hg19.snp151.bed
Or use whatever subset or other source of SNPs desired, and use the command-line to turn it into a sorted BED file.
Grab gene annotations of interest (e.g., GENCODE) and filter for genes into a sorted BED with gff2bed
:
$ wget -qO- ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_21/gencode.v21.annotation.gff3.gz \
| gunzip -c - \
| gff2bed - \
| awk '$8=="gene"' - \
> genes.bed
Say we define proximal promoters as a region 1kb upstream of the gene. We can process the file genes.bed
per-strand and generate promoter regions:
$ awk '{ \
if ($6=="+") { \
print $1"\t"($2 - 1000)"\t"$2"\t"$4"\t"$5"\t"$6; \
} \
else { \
print $1"\t"$3"\t"($3 + 1000)"\t"$4"\t"$5"\t"$6; \
} \
}' genes.bed \
> promoters.bed
Finally, we map SNP IDs to promoters with bedmap
:
$ bedmap --echo --echo-map-id-uniq --delim '\t' promoters.bed hg19.snp151.bed > snps_over_promoters.bed
This would just find all SNPs in promoter regions though. In order to get disease associated SNPs, you would have to use the Catalogue of Published GWAS or ClinVar to draw your SNPs from.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
http://www.hsls.pitt.edu/obrc/index.php?page=URL1151420236
http://genome.ufl.edu/mapper/