Question

Critical Issues In Pswms Usage

0

Entering edit mode

13.4 years ago

Anima Mundi ★ 2.9k

Hello,

I would like you to share your personal opinion about the current critical issues in the PSWMs usage. Specific literature is also wellcome.

pssm subjective matrix matrix • 2.6k views

ADD COMMENT • link updated 13.4 years ago by Lyco ★ 2.3k • written 13.4 years ago by Anima Mundi ★ 2.9k

1

Entering edit mode

what do you mean by "critical issues"? please give some examples

ADD REPLY • link 13.4 years ago by Michael Schubert ★ 7.1k

0

Entering edit mode

For "critical issues" I mean limits of the approach and/or advantages of other techniques in respect of PSWMs usage in motifs definition/discovery. A critical issue, for example, could be the problem of the lack of correlation in matrices.

ADD REPLY • link 13.4 years ago by Anima Mundi ★ 2.9k

Ram · Answer 1 · 2011-07-18

The first thing I teach biology students in bioinformatics class is my own variant of Dobzhansky's adage:

Nothing in bioinformatics makes sense except in the light of statistics.

This rule applies here as well (and should take care of Ian's complaint about PSWMs) The most critical issue is how to derive a proper cut-off value. A threshold can only be based on a statistical consideration of what can be expected by chance alone. In the simplest approximation, you could try to calculate the probability from the amino acid or nucleotide distribution. However, I wouldn't recommend doing this. What usually works much better is to run the PSWM against a database of similar size that is guaranteed to NOT contain a relevant instance of your motif (e.g. bacterial sequences when scanning for an eukaroytic motif). If such a database is not available, you could try to create a random database by one of the available methods (taking into account things like runs of nucleotides/amino acids)

score 1 · Answer 2 · 2011-07-18

1

Entering edit mode

13.4 years ago

Ian 6.1k

I have a fairly negative opinion about using PSWMs for representing binding motifs. My primary objection, when scanning sequences, is that the 'hits' are highly dependent on the match cut off. If i have used Weeder, for example, for motif discovery i will use the dominant IUPAC (DNA pattern) to scan sequences of interest. The matches (even with IUPAC ambiguity) is much more simple to interpret.

Sorry this is just my quick opinion on this subject. I would be interested if anyone else has views on this issue as well.