I'm running local blast/2.2.30+ (on a server), command blastx (-task blastx-fast
) against the nr database, with a query file of about 2000 fasta nt sequences of the form:
>12345678:1200-1400_+
ACGTACGTACGTAGCTAGCTAGCTGACTGACTG
where the first number refers to genome gi, followed by start, stop, and strand
The command I'm running is:
blastx –task blastx-fast -query filein.fa -db /../../../fdb/blastdb/nr -out fileout.fa -outfmt 11 -num_threads 24 -max_target_seqs 1 –max_hsps 1 -matrix BLOSUM62 -qcov_hsp_perc 95 –strand both
If I awk line 537 from my input file (as the error refers to a line 537) I get the following - nothing unusual (lines 536 and 538 are nucleotides like this):
$ awk '{ if (NR==537) print $0 }' file.fa
CCCTGAATTAGCAGTTAAACCATTCTTCCAATTAGCATATGACATTAATACACACCGTGGTTACTTCCGAATTTCACGTG
Unfortunately this doesn't solve it. Usually you get a 'didn't recognise flag' type error when you copy the wrong hyphen in also.
Did you correct the other instances (
–max_hsps
and–strand
)?Yes, when I copy something in I correct them all as a matter of course.
It seems to be related with output format 11. Output format 6 for instance does at least begin running (although later fails with 'Segmentation fault'?). Thanks
You may try a binary search on your input file, splitting it in halves until you find the offending sequence.