I have a list of unique kmers (5-mers in this case) that are essential to the pathway I'm researching. Is there a database where I can find what proteins recognize these motifs? Binding DNA or RNA is fine just not sure where to find the db. I'm looking at human sequences but it would be cool if there was one that had all organisms too. Let's say you were looking for all proteins that bind "TCCTG".
Try the MEME archive of motif databases. It includes multiple databases and species.
After downloading the motif_databses.12.6.tar.gz file and inflating it, we'll see a folder called motif_databases with many folders inside corresponding to several different motif databases.
For example, let's have a look at motif_databases/CIS-BP/Homo_sapiens.meme
Thanks Kamil, what is the KMER you searched for in this? I see the alphabet, but that isn't the Kmer is it? Like if you were looking for proteins that bind "TCCTG"
I'm showing you an example of a motif called M0085_1.02. The letter-probability matrix describes the motif. In this case, the motif is 10 bases long. The probability of an A in the first position is 0.213214 and the probability of an A in the second position is 0.222124, etc. Each column in the letter-probability matrix corresponds to one of the letters in the alphabet ACGT.
As Fidel mentioned, you might consider running TOMTOM or GOMo with your "TCCTG" sequence.
ADD REPLY
• link
updated 5.4 years ago by
Ram
45k
•
written 9.4 years ago by
Kamil
★
2.3k
Did you try tomtom from the meme-suite? It searches a motif against many databases. Maybe you will have to translate your kmers to a motif.