Hi thanks for reviewing my question. I have some metagenomic datasets, four sample groups that I used to predicted ORFs containing antibiotic resistance. I mapped the reads back to the ORFs and used a formula to get the transcripts per million (TPM) for each ORF. I then annotated these ORFs against the nr database using MEGAN to get the bacterial taxa it mapped too. I merged all this data into a very large excel spreadsheet that has three columns. Column 1 has the contig node id, Column 2 has the taxonomies, and Column 3 has the antibiotic resistance gene identified in the ORF. I also have additional columns for the TPM, and the absolute counts of the ORFs.
My question is how can I take this data and perform statistical associations between taxa and the antibiotic resistance genes identified and compute a co-occurrence network that shows the correlation between the antibiotic genes and the bacterial hosts which contain then?
A study that has previously shown this was published in the free article here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4611512/