Dear all, I want to select SNPs of specific genes in promoter region, how to do that?
Thanks in advance
Dear all, I want to select SNPs of specific genes in promoter region, how to do that?
Thanks in advance
Hello, you need to first get coordinate/position of the promoter region from gff/gtf
file on Ensembl. Then you need to extract all SNPs
on that region from dbSNP with tool tabix
. I hope this can help you. Last, you need to ask question with more detials, your question is too broad.
Hello fatma.mokhtar. In general, we take 3000bp upstream/downstream of TSS(transcriptions start site) of gene as possible promoter region, or you can adjust to 2000bp as you like. So if you want to extract promoter region from gtf/gff you first need to get TSS position from it. This may be complicated, so I think use R language may be better solution for this question. Or there is one website EPD which has collect promoter information, but I don't know this website much.
This is R code to get promoter region and save to local file.
# assumes need GRCh38 position
library(TxDb.Hsapiens.UCSC.hg38.knownGene)
library(rtracklayer, quietly = TRUE)
# This will get regions around TSS, you can adjust the length you want
tss <- promoters(genes(txdb), upstream = 3000, downstream = 3000)
# Then, save to file in bed format
export.bed(object=tss, con="~/Other/tss.bed", format="bed")
This is first few lines of bed file we get.
chr19 58359751 58365751 1 0 -
chr8 18388281 18394281 10 0 +
chr20 44649233 44655233 100 0 -
chr18 28174130 28180130 1000 0 -
chr11 70072433 70078433 100009613 0 -
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
As a general rule of thumb: if your question fits in one sentence, you did not explain it sufficiently. Please see Brief Reminder On How To Ask A Good Question It is unclear which data you have, which file format you are using, which organism you are studying. Please elaborate.
Thank you all for your replay,
I will formulate my question,
I have selected some genes and from those genes, I have downloaded their SNPs from Ensembl in Excel file. I want to select SNPs with high frequency in the European population with a minor allele frequency (MAF) between (0.24-0.49) but from that file, I couldn't find the SNPs that are in the promoter region?
Regards,
Please use
ADD COMMENT
orADD REPLY
to answer to previous reactions, as such this thread remains logically structured and easy to follow. I have now moved your reaction but as you can see it's not optimal. Adding an answer should only be used for providing a solution to the question asked.There are two things wrong here:
https://www.ensembl.org/biomart/martview/5750b8dcd08b12d040ed5727b7bab963
From BioMart (Ensembl) I have downloaded the data in an Excel sheet. What is the appropriate way to download it?