I have a protein motif or site, which I like to identify in an DNA sequence (multiple fasta file). The motif is N-X-S/T (X!=P), which means Asn, followed by any amino acid but not Pro, followed by Ser or Thr. Also X should not be STOP. So I would like to find all the 3 codon combinations for this site in DNA (9 nucleotides).
I was first thinking of getting the motif written in DNA using IUPAC coding, but that seemed not possible. Writing out all possibilities seems like a too hard task, so I thought there might be a tool which can do this? Any suggestions?
Doesn't BLAST(P) already support certain redundant characters?
I'm not sure you'll be able to define all of those exactly, since typically
X
means any amino acid (I think), without any restriction. You may not be able to find an alphabet that supports all of what you need.You could maybe blast:
NXS
andNXT
, and then filter the results with a regex to make sure that the next codon is!= *