I am trying to make a co-occurrence network graph for my presence/absence data of genes per genomes but am unsure how to go about with it. I'm hoping to end up with something like the first image below,
Where each gene is linked to another gene , considering if they are both present in the same genomes, where possibly a larger circle being used to describe a higher frequency gene. I originally tried using widyr and tidygraph packages but I am unsure that my data is not compatible (see second image), as it has the BGCs as rows and the individual genomes as columns. I am examining the presence/absence pattern of the gene pair to determine if they represent a coincident relationship; basically if gene i and gene j are observed together or apart in the input genomes more often than would be expected by chance.
1) Are there any suggestions on what packages/code I could use that would work with my data set, or how I could adapt my data set to work with these packages?
2) Are there any statistical tests that would be also recommended specifically to assure that there is a coincident or not type relationship?
# Example of data set
# rows = genes
# cols = genomes
set.seed(2222)
df <- matrix(sample(c(TRUE, FALSE), 50, replace = TRUE), 5)
colnames(df) <- letters[1:10]
Thanks in advanced
I have found quite useful the following package in R, for which I have been able to adapt my data based upon this:
The "CoOccur" R Package https://cran.r-project.org/web/packages/cooccur/index.html
The algorithm calculates the observed and expected frequencies of co-occurrence between each pair of species, which in my case the species will refer to the GCFs
Thanks once again for your help, thought I could leave this here to share for other users.
As you have mentioned my binary matrix is non-square matrix, is it possible to change this to square matrix? The first method provided gave me an error when trying to run:
With the second line of code provided for incidence matrix, this gave no issues at all, just a small warning message.
For question 2 of my post do you think it's possible then to make to obtain something as a P-Value statistic. This value representing an association factor between GCFs, in this case.
I will definitely revise the tutorials provided for the statistical analysis, however is it possible to conduct tests of these kind on non-squared matrices or would it be necessary to convert these to a square? In this case if converted, they will most likely coerce NAs values, can these be converted to some other value?