How To Calculate Number Of Gap Windows And Their Length In A Multiple Alignment
1
1
Entering edit mode
12.2 years ago
Biojl ★ 1.7k

Hi,

Anyone knows if there is an implemented function in biopython to calculate gap windows and their length in a multiple sequence alignment. I found a module (http://biopython.org/DIST/docs/api/Bio.Align.AlignInfo-pysrc.html) to calculate the consensus sequence but it do not allow to set that 1 gap in any sequence should give an output of a gap in the consensus. My input is a Clustalw alignment.

Seq1 AAATCG--
Seq2 ---TCGAA
Seq3 -AATCGAA
Seq4 -AATCGAA
        ***

consensus would be: ---TCG--

desired output: gap windows = 2

window1->length 3

window2->length 2

Any ideas?

python alignment biopython • 4.7k views
ADD COMMENT
0
Entering edit mode

would it not be easier to remove the "-" afterwards? Unless I am missing what you are trying to accomplish?

ADD REPLY
0
Entering edit mode

I'm just interested in getting the number of gap windows and their length. Since the clustal format only gives you * or : or blank as output (consensus) I thought it was first needed to create this consensus showing the gaps, then count them.

ADD REPLY
3
Entering edit mode
12.2 years ago
Biojl ★ 1.7k

This is the solution I came up with. It uses the SeqIO module from Bio (Biopython).

from Bio import SeqIO
handle = open('file_name', "rU") #File name corresponds to a multiple alignment in clustal format
records = list(SeqIO.parse(handle, "clustal"))

def calc_gaps(records):
    j=0
    gaps = 0
    gap = 0
    window_len = 0
    windows = []
    while j<len(records[0].seq): #j iterates through the positions of each sequence
        i=0
        while i< len(records):
            if records[i].seq[j] == '-':
                gap = 1
                break
            i+=1
        if gap==1:
            window_len +=1
            gaps=gaps+1
        else:
            if window_len!=0:
                windows.append(window_len)
                window_len=0
        j+=1
        gap=0
        if j==len(records[0].seq) and window_len!=0:
            windows.append(window_len)
    return gaps, len(windows), (float( sum(windows) ) / len(windows))

(gaps,windows,mean)=calc_gaps(records) #Here we call the function and get number of gaps, number of windows and mean number of gaps per window

I hope this will be useful for someone else.

ADD COMMENT
0
Entering edit mode

+1 for posting your solution. I was wanting to post some code using R, but got caught up in work. Glad to see you found a solution.

ADD REPLY

Login before adding your answer.

Traffic: 1758 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6