Semi-Global Alignment Tool?
3
5
Entering edit mode
14.2 years ago
Ryan Thompson ★ 3.6k

Does anyone know of any general-purpose semiglobal alignment tools? Something like BLAST for semiglobal alignments instead of local alignments.

A semiglobal alignment is like a global alignment, but penalty-free gaps are allowed at the beginning and end of the alignment. See Wikipedia for a bit more information on semiglobal alignments.

Edit: It has come to my attention that the term "semiglobal alignment" is an ambiguous; it is used to describe several different types of alignment. What I am looking for is a global alignment with no penalty for gaps at the sequence ends.

I want to use ends-free alignment to find all occurrences of a particular sequence in a full lane of Illumina readsI want to mask a short (42 bp) sequence from a lane of paired-end 100 nt Illumina reads. The sequence is expected to occur anywhere within any read with equal probability, including a partial overlap on either end, and if it appears in the middle of a read, it has to be the whole sequence. So I need to do an ends-free alignment of the short sequence against each read independently.

alignment • 12k views
ADD COMMENT
0
Entering edit mode

Would it be reasonable to use something like BLAST or any other heuristic local alignment/mapping program, and then post-process the results to filter out hits that are not end-free alignments?

ADD REPLY
8
Entering edit mode
14.2 years ago
brentp 24k

Check here where I modified Marcin Cieślik's (modification of my) code to do various alignments including glocal -- a combination of global and local--that does what you want. it's a python/cython module.

You should be able to install with:

git clone git://github.com/brentp/align.git
cd align
sudo python setup.py install

and then use as:

>>> from align import aligner
>>> aligner('WW','WEWWEW', method='glocal')
('WW', 'WW')

Hope that helps.

ADD COMMENT
0
Entering edit mode

The setup.py install step is crashing. How do I debug that?

ADD REPLY
0
Entering edit mode

Ok, I figured out that setup.py produces a cryptic error if Cython is not installed. You should probably fix that.

ADD REPLY
0
Entering edit mode

Actually, it turns out that I am looking for the alignment mode that you call "global_cfe". Are there standard definitions for any type of alignment other than local and global?

ADD REPLY
0
Entering edit mode

There are several combinations it seems, like global-local or local-global.

ADD REPLY
0
Entering edit mode

Ryan Thompson, thanks for reporting install problems. Fixed as of: http://github.com/brentp/align/commit/c7fd7c16ec0cd10fc44df633dcb272ffc7dd690f

ADD REPLY
0
Entering edit mode

Is there a way to return the alignment score and the start/end indices of the alignment in the original input sequences?

ADD REPLY
4
Entering edit mode
14.2 years ago
Michael 55k

The method pairwiseAlignment in the Bioconductor package Biostrings does this out of the box:

From the manual:

type - type of alignment. One of "global", "local", "overlap", "global- local", and "local-global" where "global" = align whole strings with end gap penalties, "local" = align string fragments, "overlap" = align whole strings without end gap penalties, "global-local" = align whole strings with end gap penalties on pattern and without end gap penal- ties on subject "local-global" = align whole strings without end gap penalties on pattern and with end gap penalties on subject.

The document Pairwise Sequence Alignments is a tutorial about how to do alignments with R.

ADD COMMENT
0
Entering edit mode

After noticing what you really want to do, I have my doubts that this method is fast enough for it.

ADD REPLY
0
Entering edit mode

Actually, the pairwiseAlignment is quite performant. Based on my benchmarks, I should be able to process a whole lane of Illumina data in under an hour on a 48-core server (which I have).

ADD REPLY
0
Entering edit mode

Also, the PDF that you link to has an example that's almost what I want to do. So thanks for that as well.

ADD REPLY
1
Entering edit mode
14.2 years ago
Manuel ▴ 410

What exactly do you want to do?

  • Do you want to solve the read mapping problem (i.e. NGS reads against a reference genome)? Look at read mappers such as bowtie, bwa. RazerS etc.
  • Do you want to do this on a smaller Scale? The SeqAn library provides you with DP alignment algorithms that allow you to initialize the matrix borders with 0's which will give you semiglobal alignments.
ADD COMMENT
0
Entering edit mode

Actually, I want to mask a short (42 bp) sequence from a lane of paired-end 100 nt Illumina reads. The sequence is expected to occur anywhere within any read with equal probability, including a partial overlap on either end. So I need to do an ends-free alignment of the short sequence against each read independently.

ADD REPLY

Login before adding your answer.

Traffic: 1944 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6