Hi everyone,
I have a question.
There is a thing I cannot understand. I need to analyze the eventual presence of some known motifs in some regions of the Human Genome. To do that, I download data from Homer (http://homer.ucsd.edu/homer/), about known motif positions genome-wide for human.
When you download the data, basically you have a very typical bed file: chrom/ start/ end/ name-of-the-known-motif/ score/ strand
According to the documentation, that score should be zoops type, so zero or one occurrence per sequence. Basically, if I understood it correctly, they have the motif, the pick a target and a background (selecting a random region with the same GC content in %), they check for the zoops score in both of them and they use a statistical model (hypergeometric or binomial) to validate or not the result.
Now, here comes the problem: For each known motif, I would expect scores that go from 1 to the length of the known motif itself, but sometimes it does not happens. To provide you an example:
chr20 1980289 1980313 ZFP3(Zf) 30 +
chr20 23160564 23160588 ZFP3(Zf) 30 -
chr20 45998341 45998365 ZFP3(Zf) 30 -
Since the length of ZFP3(Zf) is 25, as it clear even doing the difference between third and second column, how it is possible that the zoops score is 30? Shouldn't be the sum of zeros and ones that are found in correspondence of the motif-target sequences comparison? So, shouldn't be AT MOST 25?
Please, tell me what I am understanding wrong.
Thank you very much in advance.
Cheers, Sergio