Transcription Factor Binding Site Prediction
5
8
Entering edit mode
13.7 years ago

Using TF matrices to predict TF binding sites (TFBS) in regions of interest.

This is my plan:

a. Download TF matrices

I have seen TRANSFAC and JASPAR mentioned in relation to TF matrices. I have found some text files in the JASPAR database that seem like what I need and I will probably use these. Would anybody know if these are any different from the TRANSFAC matrices? Any other resources for matrices?

b. Predict TFBS in sequence of interest

For each TF matrix, predict where TFBS could be found in the sequences of interest. I have looked at the TFBS module for perl and although I don't want to doubt that what it does is right, the way that it searches for TFBS is not clear to me and so I wouldn't want to use it in a serious analysis.


My questions:

  1. Are there any easy ways to bulk download TF matrices for all known TFs? (vertebrate, fly, nematode - separate for each species)

  2. Is there a fast and usable TFBS prediction program?

  • has to run from the command line
  • has to be fast (I have quite a few sequences)

Since I am completely at a loss and TF prediction is not exactly my area of expertise, I don't know if what I'm asking for is irrelevant, solved 100 times already etc. Feel free to just point me to some relevant reviews or such and/or your favourite programs. It seems that all resources I get are from the early 00s and many are not still functional.

transcription binding prediction • 18k views
ADD COMMENT
10
Entering edit mode
13.7 years ago
Will 4.6k

I have a GIST for exactly this. You can clone/download it http://gist.github.com/764262

It uses the MOODS package (paper here: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2778336/) to process JASPAR style TFBS and any normal seq-interval format ... but with ~5 minutes of work you could switch it over to use fasta-files.

It runs blisteringly fast ... I can usually annotate all upstream-promoters of a genome within ~10 minutes.

Feel free to fork the repository and make any changes ... I always welcome pull-requests.

Hope that helps,

Will

ADD COMMENT
2
Entering edit mode

Great package! I had been looking for something like this for some time.

ADD REPLY
1
Entering edit mode

Nice to hear ... let me know if its useful.

ADD REPLY
1
Entering edit mode

This seems interesting, I'm not so good in python but maybe I could use it.

ADD REPLY
1
Entering edit mode

Hi i have a problem like you, i want to know if you could solved your problem with GIST. i don't know how can i run it. it hasnt any user guide. thanks a lot in advance

ADD REPLY
1
Entering edit mode

Is accurate enough to use TFBS matrices from humans to predict TFBS for other vertebrates ? Is there any relevant paper you can point me out? Is there also any up-to-date dabase with TFBS matrices? thanks a lot

ADD REPLY
0
Entering edit mode

Please check this paper and the related database CISBP: Weirauch, M. T., et al. (2014). "Determination and inference of eukaryotic transcription factor sequence specificity." Cell 158(6): 1431-1443.

ADD REPLY
3
Entering edit mode
13.7 years ago
Carl ▴ 80

Hi

a nice source pf PWMs is UniProbe: these are PWMs obtained using protein binding microarrays (check out Bulyks lab page here). You can download a large number of PWMs freely, for all sorts of organisms (mouse, yeast, nematode, etc...). The format is a little weird, but you can convert that to standard Transfac format (accepted by most tools) using RSA-tools convert-matrix. Select tab as the input format, and transfac as the output. I also suggest using RSA-tools matrix-scan.

ADD COMMENT
2
Entering edit mode
10.7 years ago

Try using INSECT's Server. It will help you with the TFBS search, you can add your own TFBS and perform the search either on FASTA files or in Genes from ENSEMBL, putting their IDs.

This is the publication http://www.ncbi.nlm.nih.gov/pubmed/24008418

ADD COMMENT
1
Entering edit mode
13.7 years ago
Darked89 4.7k

You may check this page

ADD COMMENT
1
Entering edit mode
13.7 years ago
Ian 6.1k

I am not a great fan of using matrices (I prefer using IUPAC patterns) for representing TFBS as it is difficult to know at what cut off a match is 'good' or not. When forced to do so I have used 'matrix-scan' at RSA Tools. It does at least allow the use of P-value thresholds.

Matrices can be directly downloaded in bulk from JASPAR; I downloaded the 'Archive.zip' and extracted the non-redundant matrices for vertebrates. I converted the JAPSAR format to TRANFAC format, as I know matrix-scan handles this well.

ADD COMMENT

Login before adding your answer.

Traffic: 1863 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6