Hello.
I'm trying to map PPI information from STRING DB into microarray data that have been converted into adjacency matrix using bicor()
function from WGCNA package.
So to do that, I'm checking each pair of genes and send requests to STRING's REST API.
Here you can see the actual code of my function:
mapGenes2PPI <- function( adjacency_matrix )
{
require(jsonlite)
genes_vector <- row.names( adjacency_matrix )
# Create a mask matrix that will contain either 0 when not interacting or 1 when interaction happens
mask_matrix <- matrix( ncol = ncol(adjacency_matrix) , nrow = nrow(adjacency_matrix) )
colnames(mask_matrix) <- colnames(adjacency_matrix)
row.names(mask_matrix) <- row.names(adjacency_matrix)
for ( gene_index in 1:length(genes_vector) )
{
gene_name_one <- genes_vector[ gene_index ]
print( paste0("http://string-db.org/api/json/resolve?species=9606&format=only-ids&identifier=",gene_name_one) )
gene_id_one <- fromJSON( paste0("http://string-db.org/api/json/resolve?species=9606&format=only-ids&identifier=",gene_name_one) )
for ( column_index in 1:ncol(adjacency_matrix) )
{
gene_name_two <- colnames(adjacency_matrix)[ column_index ]
print( paste0("http://string-db.org/api/json/resolve?species=9606&format=only-ids&identifier=",gene_name_two) )
gene_id_two <- fromJSON( paste0("http://string-db.org/api/json/resolve?species=9606&format=only-ids&identifier=",gene_name_two) )
# Because one gene could have one or more STRING id, we create a table with all possible combinations
# between the ids of gene_one_id and gene_two_id
combs <- expand.grid(gene_id_one,gene_id_two)
if( gene_name_one == gene_name_two )
next()
else
{
# For each combination on combs matrix
for ( row in 1:nrow(combs) )
{
print( paste0("[+] Checking ", gene_name_one," [", gene_index, "]" , " ( ", combs[row,1], " ) with ", gene_name_two, " [", column_index, "]", " ( ", combs[row,2], " ) ") )
print( paste0("http://string-db.org/api/json/interactionsList?identifiers=", combs[row,1], "%0D", combs[row,2] ) )
result <- fromJSON( paste0("http://string-db.org/api/json/interactionsList?identifiers=", combs[row,1], "%0D", combs[row,2] ) )
if( length(result) != 0 )
{
print( "Genes interact each other" )
print( result )
mask_matrix[ gene_index , column_index ] <- 1
break
}
else
mask_matrix[ gene_index , column_index ] <- 0
# Add some delay
Sys.sleep(1)
}
}
}
}
mask_matrix
}
This function returns a mask_matrix
that has the same dimensions with the adjacency matrix and values of 1
if interaction exists and 0
otherwise.
The thing now is that it has to check a 20,000*20,000 matrix which seems enormous and it takes a lot of time. So do you thing that there is a better more effiicient way to do such a calculation?