Best tool for finding short alignment between to sequences
2
0
Entering edit mode
7.0 years ago

I have one long sequence (several kb) (I can cut it into pieces) and I want to find small parts that align (small repeats, from 10 to 500 bp long). What is the best algorithm to use? why? Preferably I want to use inside my python script.

It should allow mismatches and gaps with a penalty (maybe 1 mismatch in 10 bp, one gap).

I've tried pairwise2.align.localxx, but it gave me very long alignments as output.

alignment python • 2.4k views
ADD COMMENT
0
Entering edit mode

To be honest this actually sounds like a motif analysis tool might do this.

For example, finding short enriched set of motif repeats in 1 long sequence....maybe try meme for this...you prob wont care for the TFs that bind, but it should find short enriched sequences in your long sequence .... then align those sequences with a multiple sequence aligner.

ADD REPLY
0
Entering edit mode
7.0 years ago

blast and vmatch pop up to my mind. Both are good at finding subsequences that align between query and reference. vmatch in particular is very flexible. Since the output of vmatch is not quite in any standard format, I wrote a converter to SAM format (here).

ADD COMMENT
0
Entering edit mode

the problem with BLAST is that it takes one sequence as query and one as database. I want to find small repeats / alignments between the two of thrm. I'll check vmatch

ADD REPLY
0
Entering edit mode
7.0 years ago
BioinfGuru ★ 2.1k

Plenty of tools here:

https://www.ebi.ac.uk/Tools/psa/

ADD COMMENT

Login before adding your answer.

Traffic: 2062 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6