There are several approaches for motif finding starting with a regular expression to neural networks with PWM and HMM in the middle (SVM classifier also perhaps). There is a tradeoff between the ability of the model to represent a complex motif to the amount of data needed to generate the model. To define a regular expression you will probably need a handful of sequences, you will need a bit more to define a useful PWM, more to train a HMM profile and a lot to train a SVM or a NN.
With TFBS it's a question of the amount of data that you have and the complexity of the binding site. Usually a PWM will work, a HMM will model dependencies between adjacent positions which might be useful, it's a matter of training data availability and biological reasoning.
How related are those bacteria? You are dealing with a lot of uncertainties here - you're not even sure there is a TFBS where you're looking. Maybe, if you have a list of genes in each bacteria, you can run MEME on each bacteria to get a PWM and then compare the binding sites or the PWM weights.
Dear Asaf, Many thanks for your comment!
I have 200 bacteria. Approximately each 3-5 bacteria are
close relatives, a single PWM is perfect for them – small site distance
(<100 nucleotides) and high binding site weight(>5.0).
But next group has a different PWM, since their output from the first
matrix shows larger site location distance (>100) and smaller site weights (about 4.5).
The third group may return to the first PWM, but it’s impossible to predict
such a behavior beforehand. The worst result is distance > 200 and
binding site weight < 4.0. It’s a signal – I have to change my PWM.
I have the only home tool for that, it’s definitely not enough.
And I wouldn’t like to do this check manually anymore.
It usually takes too much time and efforts.
Could you, PLEASE, recommend me some articles and soft to deal
with my problems? I feel a smell of NN, but I may be wrong.
Many-many THANKS!!
Natasha
How related are those bacteria? You are dealing with a lot of uncertainties here - you're not even sure there is a TFBS where you're looking. Maybe, if you have a list of genes in each bacteria, you can run MEME on each bacteria to get a PWM and then compare the binding sites or the PWM weights.
Thank you, I will try MEME.
Do you know other tools that produce PWM?
Addition:
I should have found this post earlier, sorry!
How can I create a more accurate PWM?
MEME is really useful. And the whole right panel of the post above as well.
Converting motif databases from meme suite to other formats
And another post below where I'd wtitten an answer by myself...
Is there any paper about motif finding based on PWM on genome sequence?
And many more... This one describes another approach.
transcription factor binding sites