I want to do string matching of say "ACCTGGATTTC" but allowing for mismatches such as substitutions/insertions/deletions. Can I use grep for this?
I want to do string matching of say "ACCTGGATTTC" but allowing for mismatches such as substitutions/insertions/deletions. Can I use grep for this?
grep -E 'A.*C.*C.*T.*G.*G.*A.*T.*T.*T.*C'
Good luck. You'd better have a look at http://en.wikipedia.org/wiki/Needleman%E2%80%93Wunsch_algorithm
You can use BBDuk to match that pattern allowing substitutions or indels:
bbduk.sh in=sequences.fasta literal=ACCTGGATTTC edist=1 k=11 outm=matched.fasta out=unmatched.fasta
That will allow an edit distance of 1, and will be extremely fast.
Download the BBMap package here, then unzip and untar it (tar xzf BBMap_34.94.tar.gz
). Then it will work as long as you have Java installed. You will need to add the path to bbduk.sh
to your environment or else type out the full path, e.g.
/user/bin/bbmap/bbduk.sh ... (other parameters).
...where the exact path is just the location you unzipped it.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
agrep sounds promising but I am unable to install it from github. Any alternate sources? thanks
on Debian / Ubuntu (need to be root or use sudo):