$/usr/local/bin/needle seq1.fasta seq2.fasta needle.out
Needleman-Wunsch global alignment of two sequences
Error: Sequence is not nucleic
Error: Unable to read sequence 'seq2.fasta'
Died: needle terminated: Bad value for '-bsequence' and no prompt
Error happens even when specifying
/usr/local/bin/needle seq1.fasta seq2.fasta needle.out -datafile EBLOSUM62 -auto Y -awidth3 999999999
Can narrow it down to this confusing example:
THIS WORKS
==> seq1.fasta <==
>seq1
SVAS
==> seq2.fasta <==
>seq2
SATA
I got the same error (EMBOSS 6.3.1). What's interesting is that needle works fine when you change the order of your input sequences (providing seq2.fasta as the first one and then seq1.fasta).
S,V, T and A are all valid IUPAC letters for nucleotides whereas L is not. If needle guesses nuc/prot based on seq1 (nevermind what substitution matrix you specify), then it's easy to explain a.zielezinski's observation since if you reverse the order (seq2 first) it can only be prot, whereas with seq1 it could be either: DVARNTGAYTS.. all these letters are valid IUPAC nucleotide code. However, the E in EVACNTGAYTS is not valid nucleotide code. You even get the error - "Error: Sequence is not nucleic"
tldr: Unless specified, needle probably guesses nuc/prot based on IUPAC letters and if there's not a single prot specific letter in seq1 then the guess is nuc.
Basically the program goes: Hmm, I sure wish user would have told me if these were nuc or prot seqs. Oh well, let's guess based on the first seq. First letter could be both, second letter could be both, .., all could be both. Hmm, I guess it's then likely that this is nuc seq. I sure wish the dev would have included A+T+G+C vs the rest frequency comparison. Oh well. Let's read seq 2. Oh what the hell, this is not nuc. Let's throw an error..
The fix? Use:
/usr/local/bin/needle seq1.fasta seq2.fasta needle.out -datafile EBLOSUM62 -auto Y -awidth3 999999999 -sprotein1 Y -sprotein2 Y
I always use "-sprotein" flag to indicate I am aligning protein sequences when using needle. This should indicate both are proteins and save a few keystrokes. e.g. "needle asequence=seq1.fasta bsequence=seq2.fasta gapopen=3 gapextend=1 outfile=needle.out -sprotein"
I got the same error (
EMBOSS 6.3.1
). What's interesting is that needle works fine when you change the order of your input sequences (providingseq2.fasta
as the first one and thenseq1.fasta
).... I thought this was a weird error, the order mattering makes it twice as much.