I'm trying to find promoters, in a bacterial genome for sigma 70 for example. One way I tried was to, using blastn, query one of the motifs in the sigma 70 binding site, TTGACA, but it seems it's too short to find anything (using aquifex vf5 genome fasta file as the database). I'm wondering how others suggest finding promoters?
Is it possible to include ambiguous nucleotides in blastn (in that case, I could write something like TTGACANN....NNNTATAAT for the ~35-~10 bp upstream region of a gene/operon)? Are there other tools to find shorter sequences? I've tried setting word_size to 4, and that didn't work, and neither does -task blasn-short.
TATAAT is another of the sigma 70 binding sites, to clarify
Also found this post, but it wasn't helpful, the main suggestion there seemed to use HMMER, but I'm not looking at proteins but nucleotides... https://www.reddit.com/r/bioinformatics/comments/6ue3w5/promoter_sequence_analysis_tools/?st=j6hk7a15&sh=875e00e7
(this got taken down in the bioinformatics subreddit, so asking here, in case you also saw there...)
Have you tried using actual promoter prediction tools?
bprom
is a classical one, but there are probably better/more up to date ones by now.Thank you for the pointer! I guess it makes sense to look for such a thing now that you say it, it just hadn't occurred to me!