I downloaded transfac PWM data from this link. From the date, those data is really old, from 1997 and the transfac version is 3.7. Is there any newer transfac downloadable matrix data? I check the current transfac version is 8.3. I know transfac free data is limited and to get full data I need to pay for it. Currently, I'm okay using the free data but I need the matrix file in a way I can processed automatically (in txt format or any other text format). Other tools that use transfac data is PROMO but I can not download the matrix. Is there any suggestion? Thank you very much.
Yes, I have tried that but it gives me web page result. Do you know how to download the file? Or I must parse the data from html file?
It has given this page to me:
http://alggen.lsi.upc.es/cgi-bin/promo_v3/promo/promoinit.cgi?dirDB=TF_8.3
SelectSpecies SelectFactors ViewMatrices SearchSites MultiSearchSites
I selected ViewMatrices
I saw the list of them on the right side, selected any random line and got this:
Antp [T00026]
Matrix and consensus sequence:
A
C
G
T
0 9 9 0 3 2
0 0 0 0 0 0
0 0 0 0 3 5
9 0 0 9 3 2
T A A T A G
But this is to rewrite it by hands, that's not OK.
These old posts may help as well:
Best Database Of Transcription Factor Binding Sites
See this paper, it may help.
http://nar.oxfordjournals.org/content/44/D1/D144.full.pdf+html
as well as this site:
http://www.biobase-international.com/product/transcription-factor-binding-sites from 2015
also this database from 2010:
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2841680/
Thank you very much.
I'm sorry but I have another question. From the method that you've shown me to get matrix from PROMO, is it only for human or for all species? I notice in other function, PROMO has species filter but when I click the other tools -> ViewMatrices, I only need to choose transcription factor names. I have successfully get that data but wondering about the species. Thank you very much.
See this paper:
http://nar.oxfordjournals.org.sci-hub.cc/content/31/13/3651.short
At least 4 species: humans, mouse, chicken and frog.
I will try to find a full version.
"PROMO ‘ MultiSearchSites ’ output example. The example corre- sponds to the regulatory region of the cardiac alpha-actin gene from four differ- ent vertebrate species: humans, mouse, chicken and frog. Only those binding site predictions that appear in all four sequences are shown, as boxes of different col- our and number. The image below, where the sequences are shown, is the result of selecting ‘ Zoom ’ in the main results page above. The image on the right is a detail of the SRF (serum response factor) — binding site predictions on the sequences. It also shows the weight matrix for the SRF recognition site and ran- dom expectation (RE) values for different levels of sequence-matrix similarity. The RE is calculated with a model that considers that all nucleotides are equally probable and also with a model that considers the nucleotide composition in the query sequence (in the picture represented by blue bars below matrix)."