Question

Finding Difference Values Based on Clustal Omega Distance Matrices

0

Entering edit mode

9.8 years ago

Bara'a ▴ 270

Hi all :)

I have a question about distance matrices produced by Clustal Omega application .

It's well known to all that they represent the similarities between each pair of sequences in both distance and percentage representation as follows :

100.000000 21.944035 22.133939 23.723042 19.750284 20.431328 20.885358 21.679909
21.944035 100.000000 22.827688 21.796760 22.974963 20.324006 21.944035 24.889543
22.133939 22.827688 100.000000 21.152030 22.474032 17.387033 19.830028 20.963173
23.723042 21.796760 21.152030 100.000000 20.437018 24.361493 19.059107 19.436957
19.750284 22.974963 22.474032 20.437018 100.000000 21.414538 20.094259 21.765210
20.431328 20.324006 17.387033 24.361493 21.414538 100.000000 20.432220 20.432220
20.885358 21.944035 19.830028 19.059107 20.094259 20.432220 100.000000 19.018898
21.679909 24.889543 20.963173 19.436957 21.765210 20.432220 19.018898 100.000000

But what if I wanted to find the difference percentage between each pair of sequences, depending on those matrices?!

I'm working on a pipeline that needs to filter out similarity values >= 90.00 for left flanking region and difference values >= 50.00 for right flanking region , here's the code snippet I wrote to find that :

files=['Arr-Right(Aestivum_Japonica).dst','Arr-Left(Aestivum_Japonica).dst']
for I in range(len(files)):
    name=files[i][files[i].find("-")+1:files[i].find(".")]
    retrieved=open("Rtrv-"+name+".csv",'w',newline='')
    retrieved.write(str('{0:^14}\t{1:^8}\t{2:^10}\n'.format(str("Similarity (%)"),str("Query ID"),str("Subject ID"))))
    data=np.genfromtxt(files[i])
    for row_idx, row in enumerate(data):
        for col_idx, element in enumerate(row):
            if row_idx >= col_idx :
                continue
            elif ("Left" in name and element>=90.000000):
                retrieved.write(str('{0:10.6f}\t{1:0d}\t{2:0d}\n'.format(element,row_idx,col_idx)))
            elif ("Right" in name and (100-element)>=50.000000) :
                retrieved.write(str('{0:10.6f}\t{1:0d}\t{2:0d}\n'.format(element,row_idx,col_idx)))
    retrieved.close()

My question is about the correctness of the equation I used : Is it simply (100-element)>=50.000000 or am I missing something ?!

Thanks in advance

Edited : to add the list of file names to the code snippet

clustal-omega distance-matrix python • 2.4k views

ADD COMMENT • link updated 2.6 years ago by Ram 44k • written 9.8 years ago by Bara'a ▴ 270

0

Entering edit mode

Would someone help me with this , please ?!

I really need to get the right answer , thank you all .

ADD REPLY • link updated 9.8 years ago by Ram 44k • written 9.8 years ago by Bara'a ▴ 270

0

Entering edit mode

Looks good to me, though I don't understand the first 4 lines of code. Maybe explain the code a little bit?

ADD REPLY • link 9.8 years ago by Ram 44k

0

Entering edit mode

@RamRS... The first 4 lines iterates over a list of matrices file names , process the file name to eliminate some prefix I added earlier to distinguish them from other files , add a new prefix to the retrieved result's file name , open it for writing and add some header before starting the filtering part .

I wrote it that way to avoid overwriting and have the final file names clear from prefixes and suffixes , that's all :)

ADD REPLY • link updated 9.8 years ago by Ram 44k • written 9.8 years ago by Bara'a ▴ 270

0

Entering edit mode

Oh, I see. Does the code work?

ADD REPLY • link 9.8 years ago by Ram 44k

0

Entering edit mode

@RamRS...Yes , it works perfectly :D

I'm afraid of having concept error in that equation , can you please confirm it's correctness for me ?!

ADD REPLY • link updated 9.8 years ago by Ram 44k • written 9.8 years ago by Bara'a ▴ 270

0

Entering edit mode

That is what I was wondering as well, but I guess 100-similarity is a crude measure of dissimilarity. How else would you find a quantifying parameter for difference from similarity matrices?

ADD REPLY • link 9.8 years ago by Ram 44k

Ram · Accepted Answer · 2015-02-27

2

Entering edit mode

9.8 years ago

Bara'a ▴ 270

This is the reply I had from clustalw team :

< image not found >

So , I think the equation is correct @RamRS !!

ADD COMMENT • link updated 2.6 years ago by Ram 44k • written 9.8 years ago by Bara'a ▴ 270

1

Entering edit mode

Good job on asking them and on posting the follow-up!

ADD REPLY • link 2.6 years ago by Ram 44k

0

Entering edit mode

Thanks :)

Hope this help others facing the same issue.

ADD REPLY • link updated 2.6 years ago by Ram 44k • written 9.8 years ago by Bara'a ▴ 270