You can use BEDOPS tools to query BED files that contain promoter, enhancer, TFBS and gene annotations, against a BED-formatted file that shows the position(s) of your SNP(s).
First, you could get a full list of SNPs into BED format.
Let's say you are using genome build hg19
:
$ mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19 -e 'SELECT chrom, chromStart, chromEnd, name FROM snp141Common' | tail -n +2 | sort-bed - > snp141Common.bed
You might filter this for your SNP of interest (e.g., rs937395
):
$ grep -F 'rs937395' snp141Common.bed > rs937395.bed
If you have a text file of SNP IDs of interest, you could filter on matches with entries in that file:
$ grep -Ff snps_of_interest.txt snp141Common.bed > snps_of_interest.bed
Next, you might grab annotations of interest.
As an example, let's grab GENCODE v19 records, filter them for genes, and convert the result to BED with the gtf2bed
conversion tool:
$ wget -O - ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_19/gencode.v19.annotation.gtf.gz \
| gunzip -c \
| grep -w "gene" \
| gtf2bed \
> gencode.v19.genes.bed
To demonstrate a query, we can use the bedmap
map tool to look at a 1 kb window around your particular SNP(s) of interest, looking for any GENCODE v19 gene ID annotations that fall within that window.
For instance, around rs937395
:
$ bedmap --range 500 --echo --echo-map-id-uniq rs937395.bed gencode.v19.genes.bed > answer.bed
Or around all SNPs of interest:
$ bedmap --range 500 --echo --echo-map-id-uniq snps_of_interest.bed gencode.v19.genes.bed > answer.bed
Basically, you repeat and adjust this procedure depending on the window of interest, SNPs of interest, and target annotations of interest.
Its not that easy task. But if you are looking for something to start with I would advise you to go to UCSC genome browser and search it with the rsID above. It has lots of regulatory tracks that give you peaks for different TFBS and other important regions. You can check if your SNP disrupts any of them. Its not that straighforward BTW. It will also show you the nearby genes.
Thank you very much, I will try this.