If I were to do it now, I would pick either of the following two methods:
1) Calculate the distance for each node pair using a graph library
This approach requires you to convert the data into a graph model for your library of choice. I previously did a similar thing with Jung, which is Java-based, and was really happy with it. It is hard to estimate the required memory, but I was able to calculate the length of shortest path for all possible node pairs on a PPI network I obtained from GeneMania (human) in a few hours on a standard laptop -- so it was not that bad.
If you want to see an example, you can check out this repository for an old project of mine and you will find my Java implementation that utilizes Jung (the documentation is a little bit scarce, but the code is simple):
https://bitbucket.org/armish/pathwayguy
and here is the code that calculates the pairwise shortest-path distances:
AbstractPairwiseDistanceMethod.java
I am guessing you can also go with the R-library igraph to do this calculation; I never tried it, but from the things I heard from my colleagues, I can tell it is a good alternative to Jung (especially if you are an R-person).
2) Work with adjancecy matrix
This requires you to convert the interaction file into an adjacency matrix (all nodes vs all nodes) and then multiplying it by itself, i.e. taking powers. I believe this is also not that memory intensive, especially if you go with a matrix multiplication method that is optimized for sparse matrices (R and Matlab both have many of those). For this alternative, the idea goes like this:
- You create the adjancency matrix and every non-zero value in a cell indicates a distance of 0 for the corresponding node pair (A^1)
- You multiply the matrix by itself and now every non-zero value in a cell indicates a distance of 1 for the corresponding node pair (A^2)
- You multiply the matrix by itself and now every non-zero value in a cell indicates a distance of 2 for the corresponding node pair (A^3)
- You multiply the matrix by itself and now every non-zero value in a cell indicates a distance of 3 for the corresponding node pair (A^4)
...
It is not a great approach, but surprisingly people who are familiar with matrix-based operations in either R or Matlab find this approach relatively easier to implement.
"data frame" suggests you want a solution in R?
yes Neil, I know him: that's what pfbusson wants.