Matching Strings With Mismatches
4
4
Entering edit mode
14.0 years ago
Krisr ▴ 470

I am using perl to match short nucleotide sequences against fasta sequences...

(GeneFasta =~ /$searchSeq/g)

I would like to perform this match, but allow for a mismatch in the search. Does anyone know if, and how, perl may accomplish this?

perl sequence • 16k views
ADD COMMENT
8
Entering edit mode

this is a bad idea. Why don't you use a short reads aligner instead?

ADD REPLY
7
Entering edit mode
14.0 years ago

The Bio::Grep module is pretty good as it provides a common interface for you to interact with several different fuzzy matchers, my favorite being Vmatch

ADD COMMENT
6
Entering edit mode
14.0 years ago

agrep (i.e., approximate grep) is a nice tool for this sort of thing. it's not a standard LINUX tool, but it is a good one.

Here's one implementation: ftp://ftp.cs.arizona.edu/agrep/

from the README at the above URL:

" ...for example, "agrep -2 homogenos foo" will find homogeneous as well as any other word that can be obtained from homogenos with at most 2 substitutions, insertions, or deletions. "

ADD COMMENT
0
Entering edit mode

Thanks. I'm impressed by the quality of this tool.

ADD REPLY
0
Entering edit mode

Yeah, believe it not, 3 years ago I hacked it briefly as a short-read aligner.

ADD REPLY
5
Entering edit mode
14.0 years ago
Rm 8.3k

You are looking for a fuzzy pattern matching program, try perl module String::Approx:

"Perl extension for approximate matching (fuzzy matching)"

For fuzzy pattern matching excercise and scripts go through VCU bioinformatics notes on pattern matching

ADD COMMENT
1
Entering edit mode

I've had some issues with that module - both false positives and misses.

ADD REPLY
1
Entering edit mode
14.0 years ago

Just assigning a regexp to a scalar will not work in perl for sub-sequence pattern matches e.g.

$searchSeq = "AAA[TA]";

Instead you need to use quote regular expression (qr) operator

$searchSeq = qr/AAA[TA]/;
ADD COMMENT

Login before adding your answer.

Traffic: 1518 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6