Design Pattern: Comparing Alleles
2
4
Entering edit mode
12.4 years ago

I've been asked to create an algorithm that will compare two genotyping platforms.

The algorithm would compare two genotypes and produce a message ("this is the same genotype, anti-parallele, incompatible, inversed, ....").

Each allele can be encoded as: 'A', 'T', 'G', 'C' , '0' (undefined), '-' (indel) /[ATGC]{2,}/ (indel)

and all the conditions must be considered

A1----B1
 | \/ |
 | /\ |
A2----B2

... but many returns the same message:

A/C vs G/T  "antiparallele"
A/G vs C/T  "antiparallele"
A/A vs A/A  "same"
G/G vs G/G  "same"
A/T vs A/A   user-message-1
A/C vs C/T   "inverted"
A/0 vs T/T  user-message-2

Currently, my code is just an ugly series of imbricated 'else if' statements like

if(same(A0,B0) && same(A1,B1) && !revcomp(A0,A1)) { ... }

and I'm afraid some conditions have been ignored, or I would like to be able to quickly change the way a pattern is handled.

Do you know if there is any design pattern to handle this problem ? Would you have any elegant solution for this ?

Pierre

PS: It looks like a problem for a "rules engine" like jboss drools but I don't like this library ( requires too many dependencies).

allele genotyping algorithm • 2.3k views
ADD COMMENT
4
Entering edit mode
12.4 years ago

I would approach it from the resulting message point of view rather than the input.

For each resulting message, write a check algorithm. Store all the check algorithms in a structure and run each input through the gamut of checks.

For example (in python):

def antiparallele(input):
   #my rules for antiparallele. print antiparallele if true

def same(input):
   #my rules for same. print same if true

def inverted(input):
   #my rules for inverted. print inverted if true

myTests = [antiparallele, same, inverted]

myInputs = [...]

for input in myInputs:
   for test in myTests:
      test(input)

It might not be the fastest or the least redundant way, but certainly the most manageable. You are also able to have the possibility of multiple results per input.

ADD COMMENT
1
Entering edit mode

I also think that this is the right way to go forward. I had to implement a similar algorithm (albeit much simpler) as Pierre and worked my way backwards in the way you described here too.

ADD REPLY
3
Entering edit mode
12.4 years ago

I'm 100% agreed with Damian's approach. Writing little predicate functions is the best way to be clear about what you're testing. To expand on that approach a bit, if you use a pattern matching library you can decouple the checks from the output and make it explicit how the tests map.

Here's an example in Clojure using BioJava and the core.match library. First define the checking functions:

(ns genotype-compare.core
  (:import [org.biojava3.core.sequence DNASequence])
  (:use [clojure.core.match :only [match]]))

(defn reverse-complement [x]
  (when (contains? #{"G" "A" "T" "C"} (.toUpperCase x))
    (-> (DNASequence. x)
        .getReverseComplement
        .getSequenceAsString)))

(defn same? [{[a1 a2] :a [b1 b2] :b}]
  (and (= a1 b1) (= a2 b2)))

(defn revcomp? [{[a1 a2] :a [b1 b2] :b}]
  (and (= a1 (reverse-complement b2))
       (= a2 (reverse-complement b1))))

(defn inverted? [{[a1 a2] :a [b1 b2] :b}]
  (and (= a1 b2) (= a2 b1)))

With these in place the code to describe the conditions and outputs is clear and concise:

(defn compare-genotypes [a b]
  (match [{:a a :b b}]
    [_ :when same?]     :same    
    [_ :when revcomp?]  :antiparallele
    [_ :when inverted?] :inverted
    :else               :incomaptible))

Some example code to be sure it works:

(defn test-comparisons []
  (letfn [(run-test [a b]
            (println a b "->" (compare-genotypes a b)))]
    (run-test ["A" "C"] ["G" "T"])
    (run-test ["A" "A"] ["A" "A"])
    (run-test ["A" "C"] ["C" "A"])
    (run-test ["A" "0"] ["T" "T"])
    (run-test ["A" "T"] ["-" "0"])))

[A C] [G T] -> :antiparallele
[A A] [A A] -> :same
[G C] [G C] -> :same
[A C] [C A] -> :inverted
[A 0] [T T] -> :incomaptible
[A T] [- 0] -> :incomaptible
ADD COMMENT

Login before adding your answer.

Traffic: 1899 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6