Hello everyone,
Using the dist.hamming function of 'phangorn' in R I have created a matrix of raw Hamming distance scores for a nucleotide alignment. Let's call this object 'matrix'.
I would like to remove any rows in this matrix object that contain a Hamming distance equal to or less than a given value, in any column.
I am aware this may read as a fairly basic R question, but I have not found a way to do this.
For example, I found this suggestion:
matrixless <- matrix[rowSums(matrix>=100)==ncol(matrix),]
Other ways that come to mind involve looping, and I'm sure it doesn't need to be that complex.
But this seems to return zero rows regardless of the value I use.
I would appreciate any help you may have.
Can you be more specific?
So do you want to remove any row that has a value less than or equal to any values in their corresponding column?? Can you give a small toy example?
Sure.
Let us say we have an alignment of 3 sequences as a small example, with Hamming distances given as follows in a matrix:
Let us say that I think that a Hamming distance of 1 is too small. I would like to remove any column with a Hamming distance equal to or less than 1 (I am aware that it cannot be lower than one in this example, but just go along with me here). Ideally, I'd like row one to be gone.
In reality, I have an alignment of around 6000 sequences, and I am trying to isolate a more manageable subset of these that is most representative of the diversity in the alignment.
This is unlikely to be a correct example, at least the diagonal should be 0! And Hamming distance should be symmetric.
Btw, there's a problem here, you have created a distance matrix, it should be symmetric shouldn't it: dist(A,B) = dist(B,A) ?
Then you cannot only remove rows, but you need to remove the corresponding columns as well, to keep the matrix symmetric!
Apologies, yes it was a clumsy example, and I have edited what I had first added - of course the real matrix is symmetrical and I would need to remove both rows and columns.
With this in mind, how do I do this?