Hi guys, for some reason Biopython's Bio.Blast.Applications.NcbiblastnCommandline() is not returning any hits - even for sequences with 100% overlap.
Here is my self-contained example (though this problem is not limited to these specific sequences):
from Bio.SeqRecord import SeqRecord
from Bio import Entrez
Entrez.email = "A.N.Other@example.com" # Always tell NCBI who you are
from Bio.Blast.Applications import NcbiblastnCommandline
from Bio.Blast.Applications import NcbiblastpCommandline
from Bio.Blast import NCBIXML
from StringIO import StringIO
from Bio.Seq import Seq
from Bio import SeqIO
from Bio.Alphabet import generic_dna
handle = Entrez.efetch(db="nucleotide", id="aj627603", rettype="gb", retmode="text")
record = SeqIO.read(handle, "genbank")
SeqIO.write(record, ".cache/aj627603.fasta", "fasta")
cre_fw=SeqRecord(Seq("aaatttgcctgcattaccg", generic_dna), id="cre_fw")
SeqIO.write(cre_fw, ".cache/cre_fw.fasta", "fasta")
output = NcbiblastnCommandline(query=".cache/cre_fw.fasta", subject=".cache/aj627603.fasta", outfmt=5)()[0]
print(output)
output = NcbiblastpCommandline(query=".cache/cre_fw.fasta", subject=".cache/aj627603.fasta", outfmt=5)()[0]
print(output)
You can check the subject online and that see that it contains the query precisely. Not least of all I find it curious that NcbiblastpCommandline (p for proteins) gives me a correct hit while NcbiblastpCommandline (n for nucleotides) does not.