how to interpret scores from a PWM match
0
0
Entering edit mode
9.9 years ago
Affan ▴ 310

I created a PWM using true binding sites from Riken 4 database, hg 18.

To test my PWM, I picked a random true binding site and grabbed a ~1000 bp neighbourhood around the binding site, with the center of this segment being as close to the binding site. IE the binding site is not EXACTLY right in the middle of the segment. After running my PWM I get the following results

-----------------
True binding site: CTCTTAATAG 

Views on a 1011-letter DNAString subject
subject: AGTGCACTTGCTAAAACAAAAGGAGGCCTGAGCGGCCGCAGGGCACCGCGGCG...TAACAGATTACCAACTGTTAATTTCAAACTAATTTCTTACCCACCCACAATTA
views:
    start end width
[1]   648 659    12 [TTTATTTCAAAG]
[2]   748 759    12 [TTTGTTTAAAAA]
[3]   885 896    12 [GCTTTAAATAAA]
[4]   940 951    12 [TCAATTTTTATG]
DataFrame with 4 rows and 1 column
      score
  <numeric>
1 0.8362088
2 0.8342433
3 0.8309675
4 0.8779209

In this particular example, the true binding site is

views:
    start end width
[1]   456 465    10 [CTCTTAATAG]

Now my first job here is to get the accuracy (sensitivity and specificity) of the PWM. To do this I am looking for a way to detect false positives. I am unsure on how to do this. According to the results above, 4 sites were scored higher than 40%. How do I incorporate this data into accuracy analysis.

Background information: I have exactly 1875 true binding sites. I can replicate the above analysis for all of these binding sites (ie, grab the neighbourhood, apply the PWM, analyze the score). All programming is done with R.

Secondary question: Do I have to take care of the strand information? My true binding site data looks like this:

>chr1:6585537-6585547
CTATAAATAG
>chr1:6767854-6767864
CTTTGTTTAG
>chr1:8686282-8686292
CTCTTAATAG
>chr1:10660923-10660933
GTATTTTTAA
PWM tfbs • 2.5k views
ADD COMMENT

Login before adding your answer.

Traffic: 2716 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6