How to apply BioPython function to every row of CSV
2
0
Entering edit mode
5.4 years ago
ishackm ▴ 110

Hi all,

I have the following code:

import pandas as pd
from Bio import pairwise2

# Import format_alignment method
from Bio.pairwise2 import format_alignment

data = pd.read_csv("../Results/dave.csv")
x =data["S1"] 
y =data["S2"]

for i in range(0,len(data)):
    X=x[i]
    Y=y[i]
    global_align = pairwise2.align.globalms(X, Y, 10, -2, -1, -1)
    score = global_align[0][2]


# A match score is the score of identical chars, else mismatch score.
# Same open and extend gap penalties for both sequences.

# matches = 10
#mismatch = -2
# gap = -1
# extending = -1

score_list=[]
score_list.append(score)

print(score)

The CSV Dataset:

S1  S2
AAC AAA
BBB BBBAAA

The output:

 27.0

The code only applies to the last row of the CSV file. I would like scores generated for each row of the csv please.

Desired Output:

15.0
27.0

Please note that I am very new to Python and BioPython so any help will be greatly appreciated.

Many Thanks,

Ishack

biopython python csv alignment • 1.6k views
ADD COMMENT
0
Entering edit mode

Does anyone know how to solve this please?

ADD REPLY
1
Entering edit mode
5.4 years ago
Asaf 10k

You compute score inside the loop but don't append it to the array. Besides, you print score rather than the array

ADD COMMENT
0
Entering edit mode

Hi Asaf, thanks for your response but can show me what you mean in the code please?

ADD REPLY
0
Entering edit mode

I think you can figure it out. Just go over the code and run it in your head.

ADD REPLY
0
Entering edit mode
5.3 years ago

Modify your code like this. Rest is well written.

import pandas as pd
from Bio import pairwise2

from Bio.pairwise2 import format_alignment

data = pd.read_csv("../Results/dave.csv")
x =data["S1"] 
y =data["S2"]

for i in range(0,len(data)):
    X=x[i]
    Y=y[i]
    global_align = pairwise2.align.globalms(X, Y, 10, -2, -1, -1)
    score = global_align[0][2]
    print(score)
ADD COMMENT
0
Entering edit mode

Isn't OP better off appending to the array (rather than printing) within the loop? That way, they can choose to do what they want with the set of scores after processing is done. print just passes on downstream work to the calling application, which is not great.

ADD REPLY
0
Entering edit mode

The question that was asked just mentioned that the output needs to be printed in the given format but mentioned nothing about storing those scores. Hence I suggested that solution.

ADD REPLY

Login before adding your answer.

Traffic: 2074 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6