Parallelization of Pairwise2 on Dataframe Rows?
1
0
Entering edit mode
2.2 years ago
ngarber ▴ 60

I have a dataframe that contains, for each row, a short sequence from a human protein, as well as the homologous full sequence from a mouse protein. I'm using pairwise2 to align them and extract the mouse equivalent to the short sequence in the human protein.

Unfortunately, pairwise2 is very slow, so I would like to parallelize this process to speed it up, possibly with Dask or another multiprocessing platform. How would I do that for a multi-line operation as follows?

for i in np.arange(len(data_df)): 
    human_motif = data_df.at[i, "Human_Motif"]
    mouse_sequence = data_df.at[i, "Mouse_Sequence"]

    gap_start_penalty = -15
    gap_extend_penalty = -15
    alignments = pairwise2.align.globalxs(human_motif, mouse_sequence, gap_start_penalty, gap_extend_penalty)

    best_alignment_human = alignments[0][0]
    best_alignment_mouse = alignments[0][1]

    #Find index for when the gapless aligned human motif starts
    for j, char in enumerate(best_alignment_human): 
        if char != "-": 
            aligned_motif_start = j
            break

    mouse_motif = mouse_sequence[aligned_motif_start : aligned_motif_start + len(human_motif)]

    data_df.at[i, "Mouse_Motif"] = mouse_motif

What's the best way to parallelize this?

BioPython parallelization BLAST Python pairwise2 • 617 views
ADD COMMENT
5
Entering edit mode
2.2 years ago
zorbax ▴ 650

You can use Pool

import pandas as pd
from Bio import pairwise2
from multiprocessing import Pool

THREADS=8

def pairwise_alignment(df):
    alignments = []
    for k, v in df.iterrows():
        human_motif = df.at[k, "Human_Motif"]
        mouse_sequence = df.at[k, "Mouse_Sequence"]

        gap_start_penalty = -15
        gap_extend_penalty = -15
        result = pairwise2.align.globalxs(human_motif, mouse_sequence, gap_start_penalty, gap_extend_penalty)
        alignments.append(result)
    return alignments


pool = Pool(processes=THREADS)
pool_results = pool.map(pairwise_alignment, data_df)
ADD COMMENT

Login before adding your answer.

Traffic: 2665 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6