I'm looking for a library that performs local alignment with the next features:
1- Returns the start and end position of query and subject
2- Returns more than one alignment
3- Do not require indexes or many I/O operations to run (time efficiency)
At the time I've found that:
scikit-bio -> StripedSmithWaterman and local_pairwise_align_ssw Only returns one alignment (the optimal)
biopython -> pairwise2 Does not return start and end position
https://github.com/mengyao/Complete-Striped-Smith-Waterman-Library Does not return target begin (even they're claiming so in the docs)
https://github.com/mbreese/swalign Only returns one alignment (the optimal)
Bioconductor returns the optimal and no positions
library(Biostrings)
mat <- nucleotideSubstitutionMatrix(match = 5, mismatch = -4, baseOnly = TRUE)
res1 <- pairwiseAlignment(pattern = "ACGT",subject ="ACT",gapOpening = 10.5, gapExtension = .5,substitutionMatrix = mat)
What I need is to find local alignments between two equal-length short sequences (~1500 bp) I'm using python at the time for programming the wrapper.
I'm using BLAST with word_size = 7, since this operation needs to be perfomed several times, it is not the best tool. One thing is the anount of I/O operations required to create query and subject files for BLAST communication.
BLAT with parameters -out=blast8 -oneOff=1 minScore=0 minIdentity=0 tileSize=6, reports much less results than BLAST
How about bowtie (or any of the popular NGS aligners). They return all what you asked for.
What I need is to find local alignments between two equal-length short sequences (~1500 bp). I read that bowtie is meant for aligning short sequences in large genomes, but worth the try
What is your goal? What do you want to achieve by this?
I want to find imperfect inverted repeats in sequences of about (~1500 bp) by local aligning with its reverse complementary
There are tools for that (e.g. einverted from EMBOSS package). Btw, as of BLAST, you can reduce the I/O operations regarding creating/deleting input FASTA files by providing your query and subject sequences directly from command line:
blastn -query <(echo ">seq1"; echo "ATGC";) -subject <(echo ">seq1"; echo "ATGC";) -evalue 10000 -word_size 4
.I've tried to use this approach with Popen
but all I get is, BLASTN error: Command line argument error: Argument "subject". File is not accessible: `<(echo ">s"; echo...
I'm guessing you are using
subprocess
module. I think it would only go withos
module. Btw, did you check this einverted software? If you on Ubuntu/Debian, the software is in repository, which is called emboss (apt-get install emboss
). I think the program is what you are looking for.What is the name of doing this: <(echo ">q"; echo "' + seq + '";)???
I think it is called command substition.
(cmd)
just runs the command, so it prints twoecho
that display your sequence as string and the sign<
makes the sequence as if it came from the file.einverted output is very hard to parse for me yet. Also std in/out is not very clear.
Still having some issues using os.popen(cmd).read() sh: -c: line 0: syntax error near unexpected token `('
If I run the same command (cmd variable) in the shell, I got a valid response.
Use
os.system(cmd)
. But you will have to save results to a file (blastn ... -out output.txt) and then open file in Python and read it. So still you may have some I/O issues. If you decided to change your mind to try einverted I would help you with the parsing part.