String matching algorithms of biological sequences
3
0
Entering edit mode
6.8 years ago
markic • 0

Hi,

I would like test accuracy, speed of some basics string matching algorithm on biological sequence. Where can i find a good library (python, c, c#, ... whatever) with implementation of string matching algorithm or service on the web? Do you have something that would help me, advise, ...?

sequence genome gene • 3.2k views
ADD COMMENT
0
Entering edit mode

Whats wrong with strstr, or grep

ADD REPLY
0
Entering edit mode

Nothing, but i need more algorithms with scientific approach and compare them on different data sets.

ADD REPLY
1
Entering edit mode

I am predicting, it will be hard to beat strstr or pcmpestri unless you do some precomputation on the haystack (suffixtree etc.).

ADD REPLY
0
Entering edit mode

Do you have python library?

ADD REPLY
2
Entering edit mode
6.8 years ago

EXACT STRING MATCHING ALGORITHMS / Christian Charras - Thierry Lecroq : http://www-igm.univ-mlv.fr/~lecroq/string/ "Brute Force algorithm Deterministic Finite Automaton algorithm Karp-Rabin algorithm Shift Or algorithm Morris-Pratt algorithm Knuth-Morris-Pratt algorithm Simon algorithm Colussi algorithm Galil-Giancarlo algorithm Apostolico-Crochemore algorithm Not So Naive algorithm Boyer-Moore algorithm Turbo BM algorithm Apostolico-Giancarlo algorithm Reverse Colussi algorithm Horspool algorithm Quick Search algorithm Tuned Boyer-Moore algorithm Zhu-Takaoka algorithm Berry-Ravindran algorithm Smith algorithm Raita algorithm Reverse Factor algorithm Turbo Reverse Factor algorithm Forward Dawg Matching algorithm Backward Nondeterministic Dawg Matching algorithm Backward Oracle Matching algorithm Galil-Seiferas algorithm Two Way algorithm String Matching on Ordered Alphabets algorithm Optimal Mismatch algorithm Maximal Shift algorithm Skip Search algorithm KMP Skip Search algorithm Alpha Skip Search algorithm"

and their implementations in C...

ADD COMMENT
0
Entering edit mode

Thought this was spam (Dawg matching algorithm?) before I saw the embedded link :-)

ADD REPLY
0
Entering edit mode

Do these algorithms work properly?

ADD REPLY
1
Entering edit mode
6.8 years ago
sacha ★ 2.4k

Have a look on seqan c++ library. http://seqan.readthedocs.io/en/master/

For instance, it uses 2 bits per nucleotides instead 8 used by plain text sequence.

ADD COMMENT
0
Entering edit mode

Thanks. I will try this. It seems good and easy to use.

ADD REPLY
1
Entering edit mode
6.8 years ago
chen ★ 2.5k

You may have interest to take a look at the source code of MutScan.

MutScan is based on DNA sequence string matching algorithm, and it can detect and visualize target mutations by scanning FastQ files directly.

ADD COMMENT
0
Entering edit mode

Thanks. If you have any good library please send it to me.

ADD REPLY
0
Entering edit mode

Sorry, but I don't have a library for that.

ADD REPLY

Login before adding your answer.

Traffic: 2087 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6