Question

DNA binding site motif database

0

Entering edit mode

9.8 years ago

jolespin ▴ 150

I have a list of unique kmers (5-mers in this case) that are essential to the pathway I'm researching. Is there a database where I can find what proteins recognize these motifs? Binding DNA or RNA is fine just not sure where to find the db. I'm looking at human sequences but it would be cool if there was one that had all organisms too. Let's say you were looking for all proteins that bind "TCCTG".

gene RNA-Seq ChIP-Seq genome protein • 7.3k views

ADD COMMENT • link updated 2.9 years ago by Ram 45k • written 9.8 years ago by jolespin ▴ 150

0

Entering edit mode

Did you try tomtom from the meme-suite? It searches a motif against many databases. Maybe you will have to translate your kmers to a motif.

ADD REPLY • link updated 5.7 years ago by Ram 45k • written 9.8 years ago by Fidel ★ 2.0k

Ram · Accepted Answer · 2015-10-29

4

Entering edit mode

9.8 years ago

Kamil ★ 2.3k

Try the MEME archive of motif databases. It includes multiple databases and species.

After downloading the motif_databses.12.6.tar.gz file and inflating it, we'll see a folder called motif_databases with many folders inside corresponding to several different motif databases.

For example, let's have a look at motif_databases/CIS-BP/Homo_sapiens.meme

MEME version 4.4

ALPHABET= ACGT

strands: + -

Background letter frequencies (from uniform background):
A 0.25000 C 0.25000 G 0.25000 T 0.25000 

MOTIF M0085_1.02 (TFAP2E)_(Mus_musculus)_(DBD_0.99)

letter-probability matrix: alength= 4 w= 10 nsites= 1 E= 0
  0.213214      0.176319      0.135951      0.474516    
  0.222124      0.321576      0.134725      0.321576    
  0.004784      0.213834      0.753855      0.027528    
  0.000431      0.961004      0.000168      0.038397    
  0.000296      0.937765      0.000164      0.061775    
  0.001274      0.327763      0.135950      0.535013    
  0.122837      0.314460      0.391841      0.170862    
  0.454990      0.254748      0.230506      0.059756    
  0.118505      0.001133      0.841307      0.039054    
  0.002871      0.001657      0.957955      0.037517    

URL http://cisbp.ccbr.utoronto.ca/TFreport.php?searchTF=T004846_1.02

This motif is for TFAP2E transcription factor AP-2 epsilon and we can learn a bit more about it at the URL listed at the bottom of the record.

ADD COMMENT • link updated 5.7 years ago by Ram 45k • written 9.8 years ago by Kamil ★ 2.3k

0

Entering edit mode

I'm not finding a way to actually get the proteins that bind these motifs

ADD REPLY • link 9.8 years ago by jolespin ▴ 150

0

Entering edit mode

I edited my answer to include more details. You might want to check out the CISBP website.

ADD REPLY • link 9.8 years ago by Kamil ★ 2.3k

0

Entering edit mode

Thanks Kamil, what is the KMER you searched for in this? I see the alphabet, but that isn't the Kmer is it? Like if you were looking for proteins that bind "TCCTG"

ADD REPLY • link 9.8 years ago by jolespin ▴ 150

0

Entering edit mode

I'm showing you an example of a motif called M0085_1.02. The letter-probability matrix describes the motif. In this case, the motif is 10 bases long. The probability of an A in the first position is 0.213214 and the probability of an A in the second position is 0.222124, etc. Each column in the letter-probability matrix corresponds to one of the letters in the alphabet ACGT.

As Fidel mentioned, you might consider running TOMTOM or GOMo with your "TCCTG" sequence.

ADD REPLY • link updated 5.7 years ago by Ram 45k • written 9.7 years ago by Kamil ★ 2.3k