In Python (2 or 3), given the following table:
tum1 tum2 tum3 tum4
mut1 1 1 0 1
mut2 1 1 0 1
mut3 0 0 1 1
mut4 1 1 0 1
We can write each sample as binary numbers: tum1 = 1101 (binary) = 13 (decimal). Therefore, we can use Bitwise Operators to compare those numbers:
# Original vectors
tum1 = int('1101', 2)
tum2 = int('1101', 2)
tum3 = int('0010', 2)
tum4 = int('1111', 2)
# Let's print tum1
tum1
> 13
#Bitwise comparison
bin(tum1 & tum2)[2:]
> '1101'
bin(tum1 & tum3)[2:]
> '0001'
bin(tum1 & tum4)[2:]
> '1101'
A '0' at the position 'n' means the two conditions does not share the mutation 'n'. On the contrary, a '1' at the 'n'th position means the two conditions shares the mutation 'n'.
We have now the common mutations among the tumors. However, we would like to know which one has the highest number of common mutation. So let's consider the following Python command line. It sums our previous vectors:
sum(map(int, bin(tum1 & tum2)[2:]))
> 3
sum(map(int, bin(tum1 & tum3)[2:]))
> 0
sum(map(int, bin(tum1 & tum4)[2:]))
> 3
We can see that the couples (tum1, tum2) and (tum1, tum4) have 3 mutations in common, and that the couple (tum1, tum3) have none.
In terms of time, I have tried with two 100,000-values-long vectors, composed by random '0's and '1's, the comparison seemed instant to me. Reversing and joining the table was the most time-consuming step.
Hi,
I'm wondering if the possible values in your matrix are only 0 or 1 ? If so, have your tried to consider your matrix as a vector of binary numbers that can be bitwise compared among themselves ?
Can we have a bit more details on the way you perform this actually ?
I have created a powerset from mutations which appeared more than once and checked if a given combination is present in more than one sample - as you can imagine this is not computationally effective,
In this case, the possible values in matrix are only 0 and 1, that is correct. However, I have no idea how to implement what you are suggesting in python or R.