Get transcription factor motif from list of gene symbols
0
0
Entering edit mode
2.1 years ago
Trivas ★ 1.8k

I've generated a list of transcription factors that I'm interested in and was wondering if there was an efficienct way to get their DNA binding motifs. I have a .csv with the ENSEMBL gene ID as well as gene symbol.

Furthermore, is there a good way to compare the similarity between these DNA motifs? Either an "alignment" of the motifs to see if there is a clear overlap or a way to compare GC content (without calculating it normally). Tomtom doesn't seem to fit what I want exactly, but none of the other memesuite programs seem to fit my needs.

motif factor R transcription • 1.6k views
ADD COMMENT
1
Entering edit mode

Something like TransFac would likely have this information readily available but it is a commercial software.

Try your luck with these free options: https://en.wikipedia.org/wiki/Transcription_factor_binding_site_databases

Factorbook has full list of human TF's: https://www.factorbook.org/tf/human

Factorboook seems to have UMAP clustering of motifs: https://www.factorbook.org/motif/human/meme-umap

ADD REPLY
0
Entering edit mode

I saw Factorbook but it doesn't seem like there's a way to get motifs in bulk. If I can convert from gene symbol -> ensembl project ID (e.g. https://www.encodeproject.org/experiments/ENCSR437GBJ/), I should be able to use the MEME PWM from Factorbook to extra the motif sequence. Regardless, ~50% of my TFs aren't listed on that site so I'm still hunting around for solutions. I'll check out some of the other stuff on that wiki page, thanks!

ADD REPLY
0
Entering edit mode

https://www.factorbook.org/motif-sites/ seems to offer bulk download of motifs but it may not be what you are looking for.

ADD REPLY
1
Entering edit mode

About a decade ago, when starting my PhD project, I used Jaspar and Homer to obtain/derive TF binding motifs and that is where you will probably find them still today. However, I am not sure if the whole "TF binding motif" concept is still state of the art?

I am no active scientist any more, but from random papers or talks I happened to see nonetheless, I got the impression that the gene regulation people have mostly moved away from those methods? To me, it seems that papers like de Boer et al. have pretty much dismantled the concept of a binary TF which either binds / doesn't bind a particular sequence.

Possibly, there are more timely tools out there like AI models that accept some TFs as input?

ADD REPLY
1
Entering edit mode

I was able to use the default .motif files that come with Homer to extract some motif sequences, but ~80% of my TF list are not represented in Homer. I'll have to read those papers you linked, but I guess a better way of phrasing what I'm looking for is sequence redundancy in my TF list - can I group or "cluster" my TF list into those that recognize similar sequences, then using that grouping information to infer biological meaning in my specific case. I'm also a bit removed from the gene regulation field at this point, but I don't remember hearing enough to disregard the idea of sequence-specific TF binding.

Possibly, there are more timely tools out there like AI models that accept some TFs as input?

Willing to try anything if people know of any AI models!

ADD REPLY
1
Entering edit mode

I didn't mean to say that sequence-specific TF binding should be disregarded entirely. My take-away message of those papers is, that TFs do have a preference for a specific motif, but that one can't infer binding at a particular site from the sequence and the isolated TF alone. Instead, cofactors and binding properties of other TFs expressed in the same cell need to be considered for an accurate prediction.

If your ultimate goal is not to infer target gene expression, but rather group your TFs by similarity, you might also want to consider sequence similarity and functional domains on a protein level as well as presence in similar multi-protein complexes.

ADD REPLY

Login before adding your answer.

Traffic: 2673 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6