Question

Best tool to find potential TF binding sites within a specific DNA sequence?

4

Entering edit mode

7.5 years ago

wildtype ▴ 60

Hi, I don't have much experience with motif searches, and I would like to hear your advice on the following task:

I have a DNA sequence (~300 bp) which hypothetically contains a regulatory motif. For example, a 300 bp region upstream the TSS of a gene. There is no prior knowledge of what could be binding there, and I want to have some predictions. What tool would be best to scan for motifs similar to any of the known TF binding motifs in Drosophila? Further, what could be a good tool to submit an alignment from multiple species, and find a conserved motif? (again Drosophila; I don't want to find any motif, but a motif corresponding to a known factor).

Thanks in advance

sequence tf binding motif • 9.0k views

ADD COMMENT • link updated 7.5 years ago by Whoknows ▴ 960 • written 7.5 years ago by wildtype ▴ 60

score 4 · Answer 1 · 2017-05-25

4

Entering edit mode

7.5 years ago

Petr Ponomarenko ★ 2.8k

As kennethcondon2007 said no matter what you chose to search for TFBSs in a single sequence you will get lots of falls positives. Using multiple coregulated genes to compare their promoters for enriched signal is one way of reducing FP. The second option is to search for SNP databases (this is somewhat similar to conservation) as some TFs tend to be very conserved and with much lower SNP probability. The third option is to focus on TFBSs and/or motifs that have a very narrow window of possible locations and build from them using known TF-TF interactions, for example, TATA-box. Fourth, if you know your gene is regulated by some TF and definitely is not regulated by the other, then you can force in TFBS for the first one and force second out. Fifth, lookup orthologous and paralogous genes, including pseudogenes - their promoter organization sometimes is conserved. Fifths, if you use PWMs say from TRANSFAC with Match, make sure you know the origin of the matrix and what it's direction mean. For some TFBSs direction is important and usually, this is direction relative to some other nearby TFBS. I am not an expert with Drosophila and your gene, but your gene can have alternative transcription start sites and alternative promoters. Finally, some TFs bind downstream of TSS and yours might be of this kind...

ADD COMMENT • link 7.5 years ago by Petr Ponomarenko ★ 2.8k

1

Entering edit mode

Thanks for the insights. I didn't explain the exact biological question but just an analogous example for simplicity, but in reality it's a region within the first intron, not TSS. This region is well conserved within the 12 Drosophila genomes, and there's a peak of DNA accessibility in D. mel . These observation make me think that there must be some factor binding there (not necessarily known). I would like to check for the presence of possible known motifs there, fully aware that there could be false positives.. but I don't know where else to start. There a few other genes that seem to be co-regulated, but it could be for other reasons, so I am not sure if adding them can help or hurt. I tried to see ChIP-seq/chip data on the modEncode browser from this region, but this data is from embryos and only a few TFs and there wasn't a convincing peaks.

ADD REPLY • link 7.5 years ago by wildtype ▴ 60

0

Entering edit mode

Interesting. Do you see that conservation and peak accessibility in other species in the same region? Do you have access to a wet lab or have funding to order some wet lab tests elsewhere, or is this pure bioinformatics task for you?

ADD REPLY • link 7.5 years ago by Petr Ponomarenko ★ 2.8k

0

Entering edit mode

I do I will clone the fragment and put it before a reporter to see what happens, but meanwhile i wanted to check if I could predict anything computationally.

ADD REPLY • link 7.5 years ago by wildtype ▴ 60

0

Entering edit mode

God the complexity. That even scared me a bit.

ADD REPLY • link 7.5 years ago by BioinfGuru ★ 2.1k

score 1 · Answer 2 · 2017-05-24

The problem with a single sequence is the number of possibilities in the search space. In your case you have a 300 base example. You have no idea of the length of possible motifs if any, or where they occur. You would be searching a database of many motifs of many different lengths .... the number of possible matches is enormous so any results you get could be occuring completely by chance and have no biological relevance whatsoever.

A motif search usually is carried out on a group of related sequences (not a single sequence) to find short seeds that are enriched. For example, if you have a set of 10 co-expressed genes you can extract the 300 bases upstream of the TSS . This set of 10x300 base sequences can then be analysed for short enriched fragments within. Then it is those short fragments that are used as search items against a database of TFs.

You require more sequences with a close relationship to your current one. "Close relationship" can be defined as: co-expressed, tissue specific, homologues...and many other ways.

score 0 · Answer 3 · 2017-05-25

0

Entering edit mode

7.5 years ago

Ben ▴ 60

The best way to find the potential binding site in the 300bp region is to use ChIP-seq data.

ADD COMMENT • link 7.5 years ago by Ben ▴ 60

score 0 · Answer 4 · 2017-05-25

0

Entering edit mode

7.5 years ago

Whoknows ▴ 960

You could try Genomatix tools it has 2 tools for finding best TF binding sites or best TF for candidate genes.

Genomatix works based on input sequence or Gene symbol for finding candidate binding sites or TFs

ADD COMMENT • link 7.5 years ago by Whoknows ▴ 960