Finding Min Value In List Of Lists -- Python -- Without Numpy

0

Entering edit mode

11.1 years ago

st.ph.n ★ 2.7k

I have a distance matrix, produced from jukes-cantor estimation of pairwise distances made from clustal. Given that the array is a list of lists, I'm having trouble identifying the idex and minimum value to start with a UPGMA algorithm. I would like to do this in a more "pythonic" way, and without numpy.

The matrix looks like this:

            410488935     410488927     410488931     410488939     410488937     410488923     410488933     
410488935     0.0000
410488927     0.0065 0.0000
410488931     0.0098 0.0098 0.0000
410488939     0.0850 0.0850 0.0784 0.0000
410488937     0.0817 0.0817 0.0752 0.0033 0.0000
410488923     0.0817 0.0817 0.0752 0.0033 0.0065 0.0000
410488933     0.1340 0.1340 0.1275 0.1340 0.1307 0.1307 0.0000

I pulled the sequence identifiers from the rows and columns, into two separate lists. I then replaced the "0" diagonal with X's, to aid in finding the minimum value. Here are the new lists:

[['410488935', '410488927', '410488931', '410488939', '410488937', '410488923', '410488933']] 

[[['X'], ['0.0065', 'X'], ['0.0098', '0.0098', 'X'], ['0.0850', '0.0850', '0.0784', 'X'], ['0.0817', '0.0817', '0.0752', '0.0033', 'X'], ['0.0817', '0.0817', '0.0752', '0.0033', '0.0065', 'X'], ['0.1340', '0.1340', '0.1275', '0.1340', '0.1307', '0.1307', 'X']]]

This is the small snippet I have so far to find the position of the min val and the val itself:

def identify_min(e):
    return min(
    (n, i, j)
    for i, L2 in enumerate(e)
    for j, n in enumerate(L2)
    )[1:]
    minval = float(e([lowrow][lowcol]))
    return minval, lowrow, lowcol
print identify_min(matrix)

However, the output of this function is (0,1), where I believe the output should be: (0.0033 (4, 3))

python • 14k views

ADD COMMENT • link updated 3.4 years ago by Ram 45k • written 11.1 years ago by st.ph.n ★ 2.7k

3

Entering edit mode

11.1 years ago

Alex Reynolds 36k

Note that you have two cells with the value of 0.0033. Which one do you pick?

Instead, therefore, perhaps consider making a Python dictionary, where keys are cell values, and values are a list of sequence identifier pairs. You append sequence identifiers to this list as you encounter non-zero cell values. At the end, print out the minimum (non-zero) key and the value associated with that key.

Consider the following tab-delimited input matrix file:

	410488935 410488927 410488931 410488939 410488937 410488923 410488933
	410488935 0.0000
	410488927 0.0065 0.0000
	410488931 0.0098 0.0098 0.0000
	410488939 0.0850 0.0850 0.0784 0.0000
	410488937 0.0817 0.0817 0.0752 0.0033 0.0000
	410488923 0.0817 0.0817 0.0752 0.0033 0.0065 0.0000
	410488933 0.1340 0.1340 0.1275 0.1340 0.1307 0.1307 0.0000

view raw seqIdTest.mtx hosted with ❤ by GitHub

Here is a script that takes this input and outputs the minimum key and a list of sequence id pairs associated with that minimum value:

	#!/usr/bin/env python

	import sys

	fn = sys.stdin

	ids = list()
	vd = dict()
	rowIdx = -1

	for line in fn:
	vals = line.strip("\n\t").split("\t")
	if rowIdx == -1:
	ids = vals
	else:
	yid = vals[0]
	vals.pop(0)
	for colIdx in xrange(len(vals)):
	xid = ids[colIdx]
	val = float(vals[colIdx])
	if val > 0.0 and val not in vd.keys():
	vd[val] = list()
	if val > 0.0:
	vd[val].append([xid, yid])
	rowIdx += 1

	print min(vd), vd[min(vd)]

view raw minSeqIdLister.py hosted with ❤ by GitHub

You'd run it something like this:

$ minSeqIdLister.py < seqIdTest.mtx
0.0033 [['410488939', '410488937'], ['410488939', '410488923']]

As you can see, you can find two sequence id pairs for the value 0.0033. You can keep them all, or pick the first, or pick one at random - what you do next is up to you.

Since your adjacency matrix is presumably symmetric, it doesn't matter in which order you store the ids in a pair. If you want to, you could store the row and column indices instead of the sequence identifiers, by changing what is appended to the dictionary vd.

ADD COMMENT • link updated 3.4 years ago by Ram 45k • written 11.1 years ago by Alex Reynolds 36k

0

Entering edit mode

A Python variation:

#!/usr/bin/env python

import sys

ids = list()
vd = dict()

with open(sys.argv[1]) as fn:
    next(fn)
    for line in fn:
        vals = line.split()
        ids.append(vals.pop(0))
        for colIdx in xrange(len(vals) - 1):
            if vals[colIdx] not in vd.keys():
                vd[vals[colIdx]] = list()
            vd[vals[colIdx]].append([ids[colIdx], ids[-1]])

print min(vd), vd[min(vd)]

And in Perl:

use strict;
use warnings;
use List::Util qw/min/;
use Data::Dump;

my ( %vd, @ids );
<>;

while (<>) {
    my @vals = split;
    push @ids, shift @vals;
    push @{ $vd{ $vals[$_] } }, [ $ids[$_], $ids[-1] ] for 0 .. $#vals - 1;
}

print min ( keys %vd ), dd $vd{ min keys %vd };

ADD REPLY • link 11.1 years ago by Kenosis ★ 1.3k

Login before adding your answer.