Parallelization of Biopython proccess
0
0
Entering edit mode
5.0 years ago
dllopezr ▴ 130

Hi!

I am working parsing blast xml outputs with Biopython and for performance purposes, I want to parallelize the process. The code takes a large blast xml output, perform a calculation over the alignments and save the result in a SQL database

The code is as follows:

from Bio.Blast import NCBIXML
import pymysql
result_handle = open(*file*)
blast_records = NCBIXML.parse(result_handle)

*start sql connection

def calculation(blast_record):
      # do calculation
      # upload to sql

Normally in this way the code is executed:

for blast_record in blast_records:
     calculation(blast_record)

But when I try to use tools as multiprocessing or joblib with a list comprehension like this:

processes = [mp.Process(target=calculation, args=(blast_record)) for blast_record in blast_records]

But I got either an error or the code runs indefinitely without results

Any help in how to structure the code to parallelize it or other advice?

biophython sql parallelization • 905 views
ADD COMMENT
0
Entering edit mode

I got either an error

You'll have to be more specific here.

ADD REPLY

Login before adding your answer.

Traffic: 1529 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6