Looking for a local alignment library with certain features
2
0
Entering edit mode
6.7 years ago

I'm looking for a library that performs local alignment with the next features:

1- Returns the start and end position of query and subject

2- Returns more than one alignment

3- Do not require indexes or many I/O operations to run (time efficiency)

At the time I've found that:

scikit-bio -> StripedSmithWaterman and local_pairwise_align_ssw Only returns one alignment (the optimal)

biopython -> pairwise2 Does not return start and end position

https://github.com/mengyao/Complete-Striped-Smith-Waterman-Library Does not return target begin (even they're claiming so in the docs)

https://github.com/mbreese/swalign Only returns one alignment (the optimal)

Bioconductor returns the optimal and no positions

library(Biostrings)
mat <- nucleotideSubstitutionMatrix(match = 5, mismatch = -4, baseOnly = TRUE)
res1 <- pairwiseAlignment(pattern = "ACGT",subject ="ACT",gapOpening = 10.5, gapExtension = .5,substitutionMatrix = mat)

What I need is to find local alignments between two equal-length short sequences (~1500 bp) I'm using python at the time for programming the wrapper.

I'm using BLAST with word_size = 7, since this operation needs to be perfomed several times, it is not the best tool. One thing is the anount of I/O operations required to create query and subject files for BLAST communication.

BLAT with parameters -out=blast8 -oneOff=1 minScore=0 minIdentity=0 tileSize=6, reports much less results than BLAST

alignment localalignment • 2.2k views
ADD COMMENT
0
Entering edit mode

How about bowtie (or any of the popular NGS aligners). They return all what you asked for.

ADD REPLY
0
Entering edit mode

What I need is to find local alignments between two equal-length short sequences (~1500 bp). I read that bowtie is meant for aligning short sequences in large genomes, but worth the try

ADD REPLY
0
Entering edit mode

What is your goal? What do you want to achieve by this?

ADD REPLY
0
Entering edit mode

I want to find imperfect inverted repeats in sequences of about (~1500 bp) by local aligning with its reverse complementary

ADD REPLY
1
Entering edit mode

There are tools for that (e.g. einverted from EMBOSS package). Btw, as of BLAST, you can reduce the I/O operations regarding creating/deleting input FASTA files by providing your query and subject sequences directly from command line: blastn -query <(echo ">seq1"; echo "ATGC";) -subject <(echo ">seq1"; echo "ATGC";) -evalue 10000 -word_size 4.

ADD REPLY
0
Entering edit mode

I've tried to use this approach with Popen

'-query','<(echo ">q"; echo "' + seq + '";)',
'-subject','<(echo ">s"; echo "' + seq_rc + '";)',

but all I get is, BLASTN error: Command line argument error: Argument "subject". File is not accessible: `<(echo ">s"; echo...

ADD REPLY
0
Entering edit mode

I'm guessing you are using subprocess module. I think it would only go with os module. Btw, did you check this einverted software? If you on Ubuntu/Debian, the software is in repository, which is called emboss (apt-get install emboss). I think the program is what you are looking for.

ADD REPLY
0
Entering edit mode

What is the name of doing this: <(echo ">q"; echo "' + seq + '";)???

ADD REPLY
0
Entering edit mode

I think it is called command substition. (cmd) just runs the command, so it prints two echo that display your sequence as string and the sign < makes the sequence as if it came from the file.

ADD REPLY
0
Entering edit mode

einverted output is very hard to parse for me yet. Also std in/out is not very clear.

Still having some issues using os.popen(cmd).read() sh: -c: line 0: syntax error near unexpected token `('

If I run the same command (cmd variable) in the shell, I got a valid response.

ADD REPLY
0
Entering edit mode

Use os.system(cmd). But you will have to save results to a file (blastn ... -out output.txt) and then open file in Python and read it. So still you may have some I/O issues. If you decided to change your mind to try einverted I would help you with the parsing part.

ADD REPLY
0
Entering edit mode
6.7 years ago
mrals89 ▴ 60

From your comment to @ATpoint, it seems like you're looking for 'long' read alignment algorithms. I'd suggest BWA-mem for that (it seems to be the conventional wisdom).

But OP, it's not clear why you 'cant have indexes or many I/O operations.' Because BLAST uses indices!! I can't tell if you're trying to build a Lambda alignment function, or if you have time constraints and need to run a massive amount of 1k-5k paired alignments in parallel? Or if you're building a web application? Or if you need output in a certain format?

If you want multiple alignments to a single sequence, BLAST is not optimal, BLAT is reasonable, bowtie/bwa/hisat/etc will all give multiple alignments and query coordinates in SAM format, but they all require indices and and default parameters aren't optimal for long read alignments.

ADD COMMENT
0
Entering edit mode
6.7 years ago
h.mon 35k

I think lastz fits your needs.

ADD COMMENT

Login before adding your answer.

Traffic: 1978 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6