Dear all, I want to select SNPs of specific genes in promoter region, how to do that?
Thanks in advance
Dear all, I want to select SNPs of specific genes in promoter region, how to do that?
Thanks in advance
Hello, you need to first get coordinate/position of the promoter region from gff/gtf
file on Ensembl. Then you need to extract all SNPs
on that region from dbSNP with tool tabix
. I hope this can help you. Last, you need to ask question with more detials, your question is too broad.
Hello fatma.mokhtar. In general, we take 3000bp upstream/downstream of TSS(transcriptions start site) of gene as possible promoter region, or you can adjust to 2000bp as you like. So if you want to extract promoter region from gtf/gff you first need to get TSS position from it. This may be complicated, so I think use R language may be better solution for this question. Or there is one website EPD which has collect promoter information, but I don't know this website much.
This is R code to get promoter region and save to local file.
# assumes need GRCh38 position
library(rtracklayer, quietly = TRUE)
# This will get regions around TSS, you can adjust the length you want
tss <- promoters(genes(txdb), upstream = 3000, downstream = 3000)
# Then, save to file in bed format
export.bed(object=tss, con="~/Other/tss.bed", format="bed")
This is first few lines of bed file we get.
chr19 58359751 58365751 1 0 -
chr8 18388281 18394281 10 0 +
chr20 44649233 44655233 100 0 -
chr18 28174130 28180130 1000 0 -
chr11 70072433 70078433 100009613 0 -
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
As a general rule of thumb: if your question fits in one sentence, you did not explain it sufficiently. Please see Brief Reminder On How To Ask A Good Question It is unclear which data you have, which file format you are using, which organism you are studying. Please elaborate.
Thank you all for your replay,
I will formulate my question,
I have selected some genes and from those genes, I have downloaded their SNPs from Ensembl in Excel file. I want to select SNPs with high frequency in the European population with a minor allele frequency (MAF) between (0.24-0.49) but from that file, I couldn't find the SNPs that are in the promoter region?
Please use
to answer to previous reactions, as such this thread remains logically structured and easy to follow. I have now moved your reaction but as you can see it's not optimal. Adding an answer should only be used for providing a solution to the question asked.There are two things wrong here:
From BioMart (Ensembl) I have downloaded the data in an Excel sheet. What is the appropriate way to download it?