Background: I downloaded TFBS data from the Riken database and created a PWM for the MEF2 transcription factor. Now I want to test the accuracy of my PWM.
My plan is to consider a 400bp (or maybe 1000bp) neighbourhood around each TFBS. And then run the PWM on this to see the accuracy (ie, the PWM should score the highest in the middle of the segment where the binding site in).
So, what I already have is data that looks like this:
>chr1:6585537-6585547
CTATAAATAG
>chr1:6767854-6767864
CTTTGTTTAG
>chr1:8686282-8686292
CTCTTAATAG
Now based on these locations, does there exist a R package that I can say ExtractDNA(start, stop, size)
Secondary question: once I have my genome segment, how exactly do I run my PWM on it?
Edit: I think the TFBS were from hg18
Thanks. For anyone else wondering, here is a related thread: How To Get The Sequence Of A Genomic Region From Ucsc?
:) Sorry, I should've included that link. Guess my high carb lunch has induced quite the drowsiness!