So I am using FIMO on some small datasets to look for motifs of interest and it works quite well to scan a PWM against some fasta file I chose, however the result basically outputs something like the following:
MA0041.1 Foxd3 chr1:146119062-146119793 + 24 35 2.03e-07 0.0191 GATTGTTTGTTT
MA0041.1 Foxd3 chr9:95744293-95745031 - 12 23 3.27e-07 0.0191 GAATGTTTATTT
MA0041.1 Foxd3 chr16:39712307-39712704 + 29 40 5.52e-07 0.0191 gtatgtttgtTT
MA0041.1 Foxd3 chr17:55425949-55426275 + 119 130 7.77e-07 0.0191 gtttgtttgttt
MA0041.1 Foxd3 chr17:55425949-55426275 + 123 134 7.77e-07 0.0191 gtttgtttgttt
This is great but as you can see in the last two rows it gives the same interval as having the PWM match in multiple places. What I am trying to do is take a FIMO output such as this and actually pull out intervals corresponding to the exact site of the motif within the original input intervals. I wonder how this can possibly be done? Thanks!
Perhaps one reason that you get a second hit over the same interval is that the DNA that the TF binds to is a repeat (
gttt
three times). The model might decide to identify neighboring regions as hits if they contain repeats that would still allow TF binding.