So, i'm doing a little project for a bioinformatics class in which part of it consists on generating the transciptional regulation network of Saccharomyces cerevisiae . For the sake of simplicity i'm just focusing on interactions between transcription factors. I'm using JASPAR for fetching the binding motifs of each TF as a PSSM, and, for each TF, scanning for matches (using a threshold score defined by Hertz and Stormo provided in biopython) in the upstream sequence (400 bp) of each TF gene, and thus assembling the network.
The thing is, I came across a 2002 paper by Lee et al. which experimentally determined the whole network (taking into acount all TF-DNA interactions, not just TF-TF as i'm doing), but for my surprise they also made available the TF-TF network in their data, which looks NOTHING like the one I assembled with the method described above. As in, his network has 108 edges (even though he reports that about a third of the interactions are not reported) while mine has >1000. I initially thought that differences might rely on being too loose with the threshold score or the sacanned region length but it turns out going tighter in that direction doesn't change much and the network never comes close to being similar as the one experimentally reported.
I know that for a TF having a match that scores above a threshold doesn't mean that it actually binds there in vivo since there are other factors which determine binding affinity, but it shocks me that the network turns out to be so different.
Any ideas on how to assemble a more realistic network based solely on binding motifs and promoter sequences?