A paper recently identified a motif in Drosophila that is poorly conserved. What I would like to do is search the Drosophila genome for all instances of said motif in a way that allows for mismatches at particular positions, and generate an interval file with the start and end coordinates for all instances of the motif in the genome. In addition to knowing the start and end coordinates, I would like to know the DNA sequence associated with these coordinates, as the motif will often vary from the consensus.
To be clear, what I want to do is the opposite of searching for a motif. I already know the motif, and would like to know all the locations of said motif, and the motif sequence at each location.
I suspect there are tools to do this? But I have not yet conducted any motif analysis, so I would appreciate any help. I tried the search function, but it appears most threads pertain to motif discovery, rather than my particular need.
Thanks
Have you looked at the
matchPWM()
function from the R Biostrings package? It can likely do what you want.