I have been trying to make a BLAST pipeline that takes a fasta sequence and puts it through BLAST and returns the Uniprot ID of the matching sequence and puts that into a text file.
I have been testing it only on one sequence for now (Glycine cleavage protein P in Hydra Vulgaris), its taken at time of writing over an hour and it outputs that it cannot find the sequence. If i search for it manually on Uniprot BLAST under the UniprotKB reference proteomes+Swissprot database it can find the sequence in under 2 mins.
So it definitely is a problem with my code. Am I searching in the wrong database? Thank you in advance for any help.
from Bio import SeqIO
from Bio.Blast import NCBIWWW, NCBIXML
def run_blast(sequence, output_file):
# Perform BLAST search against UniProtKB
result_handle_uniprotkb = NCBIWWW.qblast("blastp", "uniprotkb", sequence.seq)
blast_records_uniprotkb = NCBIXML.read(result_handle_uniprotkb)
# Extract UniProt IDs with the lowest E-value
best_hit_uniprotkb = min(blast_records_uniprotkb.alignments, key=lambda x: x.hsps[0].expect, default=None)
if best_hit_uniprotkb:
accession_id = best_hit_uniprotkb.accession
e_value = best_hit_uniprotkb.hsps[0].expect
database_used = "UniProtKB"
else:
accession_id = "N/A"
e_value = "N/A"
database_used = "No hits"
# Write the result to the output file
with open(output_file, "a") as f:
f.write(f">{sequence.id}\t{accession_id}\t{database_used}\t{e_value}\n")
result_handle_uniprotkb.close()
# Read the input FASTA file
input_file = "un.fasta"
output_file = "blast_results.txt"
with open(output_file, "w") as f:
f.write("")
for record in SeqIO.parse(input_file, "fasta"):
run_blast(record, output_file)
print("BLAST search completed. Results written to", output_file)
I addition to this,
if you were planning to use the web blast for many (hundreds) sequences then don't. Web blasting via command line is not meant to be used as a replacement for an install of stand alone local blast+.
Ok thank you, I didn't know this either.
Ah I see now, nr could be an option, Uniprot's ID mapping tool could convert the NCBI ascension numbers to Uniprot IDs. I wont have the computational resources for a local blast of that scale so I'll have to figure something else out regarding that. Thank you for your help