Question

Matching Strings With Mismatches

4

Entering edit mode

14.6 years ago

Krisr ▴ 470

I am using perl to match short nucleotide sequences against fasta sequences...

(GeneFasta =~ /$searchSeq/g)

I would like to perform this match, but allow for a mismatch in the search. Does anyone know if, and how, perl may accomplish this?

perl sequence • 17k views

ADD COMMENT • link updated 14.5 years ago by Alastair Kerr 5.3k • written 14.6 years ago by Krisr ▴ 470

8

Entering edit mode

this is a bad idea. Why don't you use a short reads aligner instead?

ADD REPLY • link 14.6 years ago by Giovanni M Dall'Olio 28k

score 7 · Answer 1 · 2010-12-15

7

Entering edit mode

14.6 years ago

Jeremy Leipzig 23k

The Bio::Grep module is pretty good as it provides a common interface for you to interact with several different fuzzy matchers, my favorite being Vmatch

ADD COMMENT • link 14.6 years ago by Jeremy Leipzig 23k

score 6 · Answer 2 · 2010-12-16

6

Entering edit mode

14.6 years ago

Aaronquinlan 12k

agrep (i.e., approximate grep) is a nice tool for this sort of thing. it's not a standard LINUX tool, but it is a good one.

Here's one implementation: ftp://ftp.cs.arizona.edu/agrep/

from the README at the above URL:

" ...for example, "agrep -2 homogenos foo" will find homogeneous as well as any other word that can be obtained from homogenos with at most 2 substitutions, insertions, or deletions. "

ADD COMMENT • link 14.6 years ago by Aaronquinlan 12k

0

Entering edit mode

Thanks. I'm impressed by the quality of this tool.

ADD REPLY • link 14.6 years ago by Eric Normandeau 11k

0

Entering edit mode

Yeah, believe it not, 3 years ago I hacked it briefly as a short-read aligner.

ADD REPLY • link 14.6 years ago by Aaronquinlan 12k

score 5 · Answer 3 · 2010-12-15

5

Entering edit mode

14.6 years ago

Rm 8.3k

You are looking for a fuzzy pattern matching program, try perl module String::Approx:

"Perl extension for approximate matching (fuzzy matching)"

For fuzzy pattern matching excercise and scripts go through VCU bioinformatics notes on pattern matching

ADD COMMENT • link 14.6 years ago by Rm 8.3k

1

Entering edit mode

I've had some issues with that module - both false positives and misses.

ADD REPLY • link 14.6 years ago by Jeremy Leipzig 23k

Ram · Answer 4 · 2010-12-17

1

Entering edit mode

14.6 years ago

Alastair Kerr 5.3k

Just assigning a regexp to a scalar will not work in perl for sub-sequence pattern matches e.g.

$searchSeq = "AAA[TA]";

Instead you need to use quote regular expression (qr) operator

$searchSeq = qr/AAA[TA]/;

ADD COMMENT • link updated 5.9 years ago by Ram 45k • written 14.6 years ago by Alastair Kerr 5.3k