Biopython: Executing Blast On Custom Db
1
0
Entering edit mode
10.7 years ago
weslfield ▴ 90

Ok, so I am using the NcbiblastxCommandline wrapper to execute a BLAST search on a custom database.

cline = NcbiblastxCommandline(cmd='blastx', query=temp_path, out=blast_path, subject=path, outfmt=5, max_target_seqs=1)
os.system(str(cline))

The funny thing is that it is working just fine when I run those from the command line. However, inside my script, when I execute the exact same commands, I end up getting a blank xml results file instead of the actual results. I thought that maybe the script was not waiting for the process to complete, so I tried using subprocess and subprocess.wait(). That didn't work. I also tried using something else like time.sleep(60) which is plenty time enough for it to finish. That didn't work. So I am a bit out of ideas and am very puzzled as to why it works perfectly when executing as one-liners in the command shell after >>python but won't work from a script. Thanks for any suggestions!

biopython blast • 6.8k views
ADD COMMENT
0
Entering edit mode

Have you tried to run it like this?

cline = NcbiblastxCommandline(cmd='blastx', query=temp_path, out=blast_path, subject=path, outfmt=5, max_target_seqs=1)
stdout, stderr = cline()

What do stdout and stderr say?

ADD REPLY
0
Entering edit mode

In addition to Philipp's good question, add a print(str(cline)) to see what the command Biopython tries to run for you is exactly.

If you really want to use os.system(...) then you need to check for a non-zero return code (indicating an error).

ADD REPLY
1
Entering edit mode
10.7 years ago
Daniel ★ 4.0k

I'm can't be certain, but the cmd='blastx' is extra to how I am used to calling it, and you don't specify a database in your call (implied?)

This is how I run it:

def blastn(fas, db):
    print("blasting against the " + db + " database")

    blast_in = (fas)
    blast_out = (tmpdir + "blast_files/" + sample + ".blast")


    blastn_cline = NcbiblastnCommandline(query=blast_in, db=db, evalue=0.001, outfmt=6, out=blast_out, num_threads=12)

    stdout, stderr = blastn_cline()

    blast_err_log = open(tmpdir + "blast_err.txt", "w")
    blast_stdout_log = open(tmpdir + "blast_stdout.txt", "w")

    blast_err_log.write(stderr)
    blast_stdout_log.write(stdout)

    return blast_out

Hope this helps.

ADD COMMENT
0
Entering edit mode

Yes, the command was just copied from your post - what I'm interested in is how this looks:

stdout, stderr = blastn_cline()
print(stdout)
print(stderr)

Does stderr say anything special?

ADD REPLY
0
Entering edit mode

I can't tell, are you asking me or the Original Poster?

ADD REPLY
0
Entering edit mode

Prints both return blank lines. Exit status is 0. There are no errors.

ADD REPLY
0
Entering edit mode

@Mabeuf So the DB is specified with the subject=path part of the expression. You use subject when you want to use a custom database, instead of one like nr etc... There are no errors and no stdout, gives an exit status of 0. I'll try it with a function definition and see if that changes anything. As I said, it runs perfectly well from the python interpreter via command line but not when placed inside a script. It does sit in a loop in my program, so maybe that has something to do with it. I really am out of ideas.

ADD REPLY
2
Entering edit mode

No, you still use -db with a custom database (created using the makeblastdb command). You use -subject if you don't have a database, and instead want to search against a FASTA file doing pairwise comparisons (which will give potentially misleading e-values). See http://blastedbio.blogspot.co.uk/2012/05/blast-ingoring-search-space-size-for-e.html

ADD REPLY
0
Entering edit mode

Oh, ok. Thanks for the tip. So I went ahead and created a BLAST DB using makeblastdb which created 3 files with the extensions .phr .pin .psq (all begin with blastdb). So then I tried to rerun the blastx command using the -db flag and pointing to the blastdb. I ended up getting an xml file with no hits in it. Blast ran smoothly i.e. it found the newly created database but just didn't yield any hits in the xml file. Upon examining the .phr .pin .psq files, they have some very strange characters in them like 

ADD REPLY
1
Entering edit mode

The BLAST database files are binary - the 'string characters' are to be expected. Was your XML output file an empty file, or was there some XML but not BLAST hits?

ADD REPLY
0
Entering edit mode

There was some XML, but just no hits tagged. Gave the iteration message <iteration_message>No hits found</iteration_message>.

ADD REPLY
0
Entering edit mode

Also, the same problem I was having before where when I run things from the command shell, everything "works" fine but when I runt hem inside a python program, the XML output is totally blank...frustrating.

ADD REPLY
0
Entering edit mode

UPDATE: So i fixed the BLAST output problem when running it from the command line. I was using blastx instead of blastp and this, I think, was incompatible with the database type I was constructing with makeblastdb (type 'prot'). But I still cant get it to run properly from a Python program... still get the blank XML document when I should get a decent sized amount of hits. Any suggestions?

ADD REPLY
0
Entering edit mode

UPDATE 2: Fixed everything. For some reason it didn't like the fact that I had a file stream up while I was blasting. Had to close it, blast, then reopen it. Thanks for your help!

ADD REPLY
0
Entering edit mode

Personally, I've never used the 'subject=' call, I use 'db=' even for custom databases, so I can't comment on that. If you look at http://biopython.org/DIST/docs/api/Bio.Blast.Applications-pysrc.html at line 651 it says that "subject" is for "Subject sequence(s) to search." rather than a path to db, although it's not in the 'NcbiblastnCommandline' function. I'm just guessing though. Maybe worth a go?

ADD REPLY
0
Entering edit mode

Really, what you need is for Peter to drop in!

ADD REPLY
0
Entering edit mode

Regarding cmd='blastx' that has no effect (it is already the default value). Use the cmd=... option if your BLAST binary is not on the $PATH and therefore must be given in full.

ADD REPLY

Login before adding your answer.

Traffic: 1832 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6