Entering edit mode
5.0 years ago
dllopezr
▴
130
Hi!
I am working parsing blast xml outputs with Biopython and for performance purposes, I want to parallelize the process. The code takes a large blast xml output, perform a calculation over the alignments and save the result in a SQL database
The code is as follows:
from Bio.Blast import NCBIXML
import pymysql
result_handle = open(*file*)
blast_records = NCBIXML.parse(result_handle)
*start sql connection
def calculation(blast_record):
# do calculation
# upload to sql
Normally in this way the code is executed:
for blast_record in blast_records:
calculation(blast_record)
But when I try to use tools as multiprocessing or joblib with a list comprehension like this:
processes = [mp.Process(target=calculation, args=(blast_record)) for blast_record in blast_records]
But I got either an error or the code runs indefinitely without results
Any help in how to structure the code to parallelize it or other advice?
You'll have to be more specific here.