Fastest way to perform BLAST search using a multi-FASTA file against a remote database
0
0
Entering edit mode
3.3 years ago
accibio ▴ 20

I have a multi-FASTA file having ~125 protein sequences. I need to perform a BLASTP seach against remote nr database. I tried using NcbiblastpCommandline, but the issue is that it only accepts files as input. Since my file has a huge number of sequences, I get this error ERROR: An error has occurred on the server, [blastsrv4.REAL]:Error: CPU usage limit was exceeded, resulting in SIGXCPU (24). Storing each sequence from the multi-FASTA file to a separate file at a time works, but then the BLAST search becomes tremendoulsy slow (~10 min/query on an average as opposed to ~1 min/query on the NCBI site).

blastp_results = []
from Bio.Blast.Applications import NcbiblastpCommandline
from Bio import SeqIO
record_iterator = SeqIO.parse("AmpB_DEPs.fasta", "fasta")

for record in record_iterator:
    entry = str(">" + i.description + "\n" + i.seq)
    f1 = open("test.txt", "w")
    f1.write(entry)
    f1.close()
    f2 = open("test.txt", "r")
    blastp_cline = NcbiblastpCommandline(query = 'test.txt', db = 'nr -remote', evalue = 
    0.05, outfmt = '7 sseqid evalue qcovs pident')
    res = blastp_cline()
    blastp_results.append(res)
    f2.close()

I also tried using NCBIWWW.qblast but it doesn't seem to provide Query coverage information in the output, something which is important for my study.

Can somebody suggest a way to deal with this issue without compromising on search space or default parameters of BLAST? Suggestions on implementing BLAST in other languages such as PERL, R etc. would also be appreciated.

FASTA BLAST python biopython • 1.1k views
ADD COMMENT
0
Entering edit mode

Public resources are there for all to share in a fair manner. What you are trying to do is exceeding the limits set for what NCBI considers fair use. You can either be patient and wait to get your results. If you try to defeat the guards put in place by NCBI you may get IP banned. You can always invest in a cloud computing environment and do the search on a VM with multiple cores/plenty of RAM and be done sooner.

ADD REPLY

Login before adding your answer.

Traffic: 2680 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6