This question is taking bioinformatics analysis to down stream analysis.
I have identified 3 Chipseq motifs (obviously I ran chipseq called peaks and identified enrichment regions > identified motifs) which are around 7-9 letters with one mismatch. I took the coordinates of these 3 motif seq's (based on p value <0.001) and intersected them with coordinates from all the enriched regions which I have initially identified in order to find out the selected motif is enriched which of genes in my data set. When I used bed tool I used -Wo option with 50% overlap.
Now when I check the position in the sequence which correspond to motif, I don't find the exact sequence- I should see the exact seq (e.g CMGGATC) with one base mismatch. What I find is 3-4 bases randomly at the same location (e.g AGGGAA)? How can I be sure that I am not designing primers/ EMSA assay for a false motif??? Should I be not having almost all letters except one mismatch in actual motif?
As You will agree it will require lot of effort sand time. If you wonder what I used to do all this I used-MEME suit and has take advice from my previous post discussion. A: motif seq in enriched genes
Did you test the candidate binding regions with qPCR, etc?
That is where the problem is- I am not getting enough confident with these computational approaches. What I am not sure when we see these logos of Chipseq data - Is that complete seq has to be there in the specified gene seq or partial. If partial how partial- 50% ? 90%
I am just wondering whether you have tested that the regions from you ChIP-seq analysis are real, before trying to determine what the motif might be.
Yes we performed qPCR and it seems to be working