Hello,
I've been trying to write a code to find a consensus motif in a given sequence, and for this purpose, I was only able to reach till finding a substring in a string. I want to be able to allot multiple nucleotides/amino acids at each position, and also enter N/X representing any of the nucleotides/amino acids. I would very much appreciate any help.
Thanks.
P.S. The post tags represent the languages I'm comfortable understanding.
Edit: Example of the consensus motif - A/T A A G C A A/T/G N N A
Sequence - CGATCGTG TAAGCAGCTA GTCATG
Bolded sequence is the consensus
Thanks a lot. It's perfect.
In the same lines of Carlo Yague
Thanks. This works too. I could use the
.{2}
when I have larger repeats of any nucleotide/amino acid. Although, I would like to know why it[\1G]
and not[ATG]
?The first AT is made a group and every time and anywhere you can call it by its serial number (1 here)
That's an extremely handy option. Thanks again :)