Biopython-Blast:Querying A Single Sequence Without Input File
1
0
Entering edit mode
11.0 years ago
Zealseeker • 0

Hello, I am studying the application of blast in biopython. Now a problem is troubling me. I have to create a fasta file to use the function NcbiblastpCommandline( which is similar to blastp in blasp+). But I don't want to do so. I am making a website with Python, in which the user can compare his sequence to my protein database. A 'hard' way to solve the problem is: When the user submits his sequence, the server creates a file and executes the function of NciblastpCommandline.

Analogically, is it possible that it doesn't output as a file, just a string ? Then I can extract the valuable information easily without create (and delete) any files.

There is a similar post in biostarts, but the editor's environment seems to be linux shell. Local Blast: Querying A Single Sequence Without Input File. Possible ?

Thanks.

blast+ biopython • 8.3k views
ADD COMMENT
2
Entering edit mode

Blastp can read a query from stdin like "query -" and as I recall it outputs by default to stdout..

ADD REPLY
0
Entering edit mode

I use the python script. can you provide a demo?

ADD REPLY
1
Entering edit mode

In terminal:

    cat file.fasta | blastp -db nr -outfmt 6 -query -

Is the same than:

    blastp -db nr -outfmt 6 -query file.fasta

I don't know about python.

ADD REPLY
0
Entering edit mode

I' sorry. It doesn't work. My requirement is "no sequence file(like file.fasta)". The server gets the query of sequence(stored in RAM), and then outputs the result, just as the website of ncbi does.

ADD REPLY
0
Entering edit mode

Well, like your link says, something like:

    echo -e ">Name\nATCGTTAGCT" | blastp -db nr -outfmt 6 -query -

works too..

ADD REPLY
0
Entering edit mode

It seems that i understand it ! I use cmd in windows "echo XXX | blastp -db -outfmt -5" and it works. next step is how to make python script link to the command. Thank you very much.

ADD REPLY
0
Entering edit mode

Sorry but what make you think the NCBI website doesn't write the input sequence to a temporary file before running blast?

ADD REPLY
0
Entering edit mode

Just in my opinion. Or else the server will create and delete hundreds of thousands of files every day. If I were a member of the website, I would try to store the sequence information in the RAM(as a variable instead of a exist file). It would be faster and beneficial to the hard disk. So I am confused why the Blast+ command support file-input generally.

ADD REPLY
1
Entering edit mode
11.0 years ago
import subprocess
from Bio.Blast.Applications import NcbiblastpCommandline

query = 'NNAGFLD\nSNLIIVLNDN'  #your string from some external source
blastp_cline = NcbiblastpCommandline(db="nr", outfmt=5) #format Blast command
process = subprocess.Popen(str(blastp_cline), stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE) #setup the process

out, err = process.communicate(input=query) #run the process
print out
print err
ADD COMMENT
3
Entering edit mode

You don't need to call subprocess explicitly, the command line wrapper can be invoked directly and given a stdin string.. Try:

from Bio.Blast.Applications import NcbiblastpCommandline
query = 'NNAGFLDSNLIIVLNDN'  #your string from some external source
blastp_cline = NcbiblastpCommandline(db="nr", outfmt=5) #Blast command
out, err = blastp_cline(stdin=query)
print out
print err
ADD REPLY
0
Entering edit mode

Wow, yes, that is more elegant, I didn't know about that. In the Python documentation for Popen.communicate it mentions "The data read is buffered in memory, so do not use this method if the data size is large or unlimited." (hence my comment above) Is that still true for this method?

Edit: My guess is that it is still true, and that to avoid that problem you'd have to write stdout and stderr to files during Blast execution to keep the variables from taking up too much memory. Although this is probably a rare use case.

ADD REPLY
0
Entering edit mode

Yeah, any large stdout from BLAST is probably better sent to a file which can then also be reused. If you really want to efficiently parse the output from stdout gradually, use subprocess explicitly as per some of the examples in the Biopython Tutorial: http://biopython.org/DIST/docs/tutorial/Tutorial.html

ADD REPLY
0
Entering edit mode

Thank you very much. But do you know how can I use mutil-line query? It seems that DOS command doesn't support mutil-line string. I assigned mutil-line string to the variable of query and it doesn't work.

ADD REPLY
0
Entering edit mode

edited. Hope this works better. I don't know how it will perform on huge queries, but I think it should be ok for typical usage.

ADD REPLY

Login before adding your answer.

Traffic: 2652 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6