DNA binding site motif database
1
0
Entering edit mode
9.1 years ago
jolespin ▴ 150

I have a list of unique kmers (5-mers in this case) that are essential to the pathway I'm researching. Is there a database where I can find what proteins recognize these motifs? Binding DNA or RNA is fine just not sure where to find the db. I'm looking at human sequences but it would be cool if there was one that had all organisms too. Let's say you were looking for all proteins that bind "TCCTG".

gene RNA-Seq ChIP-Seq genome protein • 6.6k views
ADD COMMENT
0
Entering edit mode

Did you try tomtom from the meme-suite? It searches a motif against many databases. Maybe you will have to translate your kmers to a motif.

ADD REPLY
4
Entering edit mode
9.1 years ago
Kamil ★ 2.3k

Try the MEME archive of motif databases. It includes multiple databases and species.

After downloading the motif_databses.12.6.tar.gz file and inflating it, we'll see a folder called motif_databases with many folders inside corresponding to several different motif databases.

For example, let's have a look at motif_databases/CIS-BP/Homo_sapiens.meme

MEME version 4.4

ALPHABET= ACGT

strands: + -

Background letter frequencies (from uniform background):
A 0.25000 C 0.25000 G 0.25000 T 0.25000 

MOTIF M0085_1.02 (TFAP2E)_(Mus_musculus)_(DBD_0.99)

letter-probability matrix: alength= 4 w= 10 nsites= 1 E= 0
  0.213214      0.176319      0.135951      0.474516    
  0.222124      0.321576      0.134725      0.321576    
  0.004784      0.213834      0.753855      0.027528    
  0.000431      0.961004      0.000168      0.038397    
  0.000296      0.937765      0.000164      0.061775    
  0.001274      0.327763      0.135950      0.535013    
  0.122837      0.314460      0.391841      0.170862    
  0.454990      0.254748      0.230506      0.059756    
  0.118505      0.001133      0.841307      0.039054    
  0.002871      0.001657      0.957955      0.037517    

URL http://cisbp.ccbr.utoronto.ca/TFreport.php?searchTF=T004846_1.02

This motif is for TFAP2E transcription factor AP-2 epsilon and we can learn a bit more about it at the URL listed at the bottom of the record.

ADD COMMENT
0
Entering edit mode

I'm not finding a way to actually get the proteins that bind these motifs

ADD REPLY
0
Entering edit mode

I edited my answer to include more details. You might want to check out the CISBP website.

ADD REPLY
0
Entering edit mode

Thanks Kamil, what is the KMER you searched for in this? I see the alphabet, but that isn't the Kmer is it? Like if you were looking for proteins that bind "TCCTG"

ADD REPLY
0
Entering edit mode

I'm showing you an example of a motif called M0085_1.02. The letter-probability matrix describes the motif. In this case, the motif is 10 bases long. The probability of an A in the first position is 0.213214 and the probability of an A in the second position is 0.222124, etc. Each column in the letter-probability matrix corresponds to one of the letters in the alphabet ACGT.

As Fidel mentioned, you might consider running TOMTOM or GOMo with your "TCCTG" sequence.

ADD REPLY

Login before adding your answer.

Traffic: 1848 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6