I am using perl to match short nucleotide sequences against fasta sequences...
(GeneFasta =~ /$searchSeq/g)
I would like to perform this match, but allow for a mismatch in the search. Does anyone know if, and how, perl may accomplish this?
I am using perl to match short nucleotide sequences against fasta sequences...
(GeneFasta =~ /$searchSeq/g)
I would like to perform this match, but allow for a mismatch in the search. Does anyone know if, and how, perl may accomplish this?
The Bio::Grep module is pretty good as it provides a common interface for you to interact with several different fuzzy matchers, my favorite being Vmatch
agrep (i.e., approximate grep) is a nice tool for this sort of thing. it's not a standard LINUX tool, but it is a good one.
Here's one implementation: ftp://ftp.cs.arizona.edu/agrep/
from the README at the above URL:
" ...for example, "agrep -2 homogenos foo" will find homogeneous as well as any other word that can be obtained from homogenos with at most 2 substitutions, insertions, or deletions. "
You are looking for a fuzzy pattern matching program, try perl module String::Approx:
"Perl extension for approximate matching (fuzzy matching)"
For fuzzy pattern matching excercise and scripts go through VCU bioinformatics notes on pattern matching
Just assigning a regexp to a scalar will not work in perl for sub-sequence pattern matches e.g.
$searchSeq = "AAA[TA]";
Instead you need to use quote regular expression (qr
) operator
$searchSeq = qr/AAA[TA]/;
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
this is a bad idea. Why don't you use a short reads aligner instead?