How to find promoter sequence of a gene?
2
3
Entering edit mode
21 months ago
sunyeping ▴ 110

Hello all,

Do you know any database or software that can help define the positions and sequences of promoters of a given transcriptional factor for a given gene?

Best regards

promoter • 9.4k views
ADD COMMENT
2
Entering edit mode

Why is it so difficult to provide proper information? Presumably you know that finding promoters in prokaryotes and eukaryotes are two completely different problems. The former is relatively well understood and there are databases, while the latter faces significant challenges.

It is easy to find a good deal of info on this subject by Googling.

ADD REPLY
0
Entering edit mode

Thank you for your response. I would like to identify promoters of eukaryote genes, for example, the promoters of Itga1 and Itgae genes. What is the acceptable software or database to do this and what are the procedures? I am trying to google still get no satisfying answers.

regards.

ADD REPLY
2
Entering edit mode

Please describe in more detail what kind of input you have and which organism you are working with.

As a rule of thumb, everything between -500bp and 100bp around a transcription start site is typically considered the promoter region. For organisms with transcript annotation, you can retrieve the sequences with BioMart or extract with Bedtools or Seqkit from the reference genome fasta.

To define TSS, you can look for published CAGE-seq data (e.g. that of the Fantom consortium), if your organism is eukaryotic, since the method depends on capped RNAs. ChIP-seqs for your transcription factor can be used to run a motif search with Homer. The lab of Aviv Regev has published corresponding ML-models etc...

ADD REPLY
0
Entering edit mode

The organism is mice.

ADD REPLY
0
Entering edit mode

Short answer: you won't find a good 'black box' you can just run for this (at least with any confidence in the output).

Long answer: do an RNAseq experiment, obtain the transcriptional data for your genome, and look for reads mapping to inter genie regions upstream of genes. This will still be a 'woolly' answer though, as promoter boundaries are not always concrete.

ADD REPLY
0
Entering edit mode

Hello Jeo,

We found that Jun/fos are highly expressed in RNAseq. But to map reads mapping to regions upstream of genes, do you mean ATACseq?

I don't need the exact boundary of the transcriptional factor. I just need to know whether Jun/fos binds to the upstream of a given gene and what are the necessary sequences for the binding. Is there no bioinformatic tool to do this?

As I understand, RNA polymerases for transcription also bind to promoters. Are the promoter sequences for RNA polymerase binding are different from those of transcriptional factors? How to define the promoter sequences for RNA polymerase binding upstream of a given gene?

Thank you.

ADD REPLY
2
Entering edit mode

You can use the UCSC genome browser to get the the sequence of X bases upstream of a gene, then use tools in MEME-suite/JASPAR to look for Jun/Fos motif in that region.

I think what's being conveyed by the other user is that there's the proximal promoter (which the above analysis will capture) but also distal promoters/enhancers that could be thousands and thousands of bases away that will be missed doing this quick type of analysis. You might be better off doing a ChIP-seq experiment or taking advantage of publicly available ChIP-seq data.

ADD REPLY
2
Entering edit mode
21 months ago
Getting there ▴ 120

I would start with ENCODE data here ENCODE PROJECT

You can select a specific transcription factor you are interested in, a tissue type, organism...etc, and download a BED file with the ChIPseq peaks for that transcription factor. You can also look at histone methylation patterns to get some idea of promoter/enhancer regions.

ADD COMMENT
2
Entering edit mode
21 months ago
sunyeping ▴ 110

Steps to find transcriptional factors (TF) binding motifs

  1. In NCBI gene, search and find the page for the gene of interest, using tools > sequence text view. According to the color labels, find the transcriptional start sites (TSS).
  2. Select the sequence at -1000~300 around the TSS.
  3. At JASPAR website, select the species and the TFs, input the selected sequences, and scan the TF motif in the sequences.
ADD COMMENT

Login before adding your answer.

Traffic: 2901 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6