Hi,
I am trying to run a Differential methylation analysis with RnBeads as I have done previously on a similar fashion. This time I am just running it on my HPC and it seems to run just fine until it arrives to the differential methylation step. See the error output below:
Loading required package: BiocParallel
opening ff /local/scratch/96814/RtmpmMxVBc/ff/ff1a07b2bb247d.ff
opening ff /local/scratch/96814/RtmpmMxVBc/ff/ff1a07b6ddc2074.ff
Coordinate system already present. Adding new coordinate system, which will replace the existing one.
**Error in rnb.sample.groups(x, columns = pheno.cols, columns.pairs = columns.pairs) :
invalid value for columns.pairs; not all names are present in the annotation
Calls: rnb.run.analysis ... rnb.execute.computeDiffMeth -> get.comparison.info -> rnb.sample.groups
In addition: There were 50 or more warnings (use warnings() to see the first 50)
Execution halted**
I suspect it is a naming issue so this is the first line of my samplesheet - I tried attaching it as an image but the website broke on me. I have a total of 51 samples and 3 groups: Non inflammed vs Responders, Non inflammed vs Non responders and Responders vs Non responders.
I have used Res, Non_Res and Non to identify the different samples and when I did not want a sample to be used in a comparison I used NA
filename_bed Sample_ID Sample_Group Non_Inflamed_vs_Res Non_Inflamed_vs_Non_Res Res_vs_Non_Res
PN0272_044_S44_L001_R1_001.sorted.markDups_CpG.bedGraph.output.bedGraph PN0272_044_S44_L001_R1_001_SUDV27_A Res Res NA Res
PN0272_045_S45_L001_R1_001.sorted.markDups_CpG.bedGraph.output.bedGraph PN0272_045_S45_L001_R1_001_SUDV27_B Res Res NA Res
All the 51 files as they appear in the folder where the samplesheet.txt is located follow this naming structure:
PN0192_0001_S1_L001_R1_001.sorted.markDups_CpG.bedGraph.output.bedGraph
Again, we know it is ugly and redundant but we were given the files and my colleague did not want to change the names.
And finally, below is the code I am using to run this differential methylation analysis:
library(RnBeads)
library(RnBeads.hg19)
library(dplyr)
library(purrr)
library(grid)
library(readr)
library(sva)
data.dir <- ("/home/sdata/epigastro_alejandra/store2/arodriguez/RnBeads_Lola_Analysis/AllLocation_responders/")
bed.dir<-file.path(data.dir) #just for loading SAMPLES for RnBeads
sample.annotation<-file.path(data.dir, "samplesheet.txt") #specify name comparisons
analysis.dir<-file.path(data.dir, "AllLoc_Analysis") # specify name of analysis directory
report.dir<- file.path(analysis.dir,"AllLoc_Report") # specify name of report directory
####### SETTING PARAMETERS ####################################################
rnb.options(analysis.name = "AllLoc_analysis") #Name this analysis
rnb.options(email="alejandrarodrigu21@rcsi.com") #email contact for this analysis
rnb.options(logging = TRUE) #Default setting TRUE: Creating records of the analysis
rnb.options(identifiers.column="Sample_ID", assembly="hg19", import.bed.style="bismarkCov", import.table.separator="\t")
####### COVARIATE ####################################################
rnb.options(inference=TRUE)
rnb.options(inference.targets.sva="Batch", inference.sva.num.method="be")
rnb.options(inference.age.prediction=FALSE)
rnb.options(covariate.adjustment.columns=c("Age", "Gender"))
####### DIFFENTIAL MET ANALYSIS
#APPLY BATCH CORRECTION
rnb.options(differential.adjustment.sva=TRUE)
#LOOK FOR IMMUNE CELL CONTENT
rnb.options(differential.comparison.columns.all.pairwise = c("Histology","Stage_sum"))
rnb.options(export.types=c("sites"))
rnb.options()
rnb.getOption("assembly")
rnb.options("differential.enrichment.go"=FALSE)
rnb.options("differential.enrichment.lola"=FALSE)
rnb.options(export.to.csv=TRUE)
rnb.options(differential.comparison.columns="Histology", columns.pairing=c("Histology"="Patient_ID"))
gc()
analysis <- rnb.run.analysis(dir.reports=report.dir,initialize.reports=TRUE, sample.sheet=sample.annotation, data.source=bed.dir, data.type="bs.bed.dir")
I thought that maybe you can see something that I am not seeing right now. This is the end bit of the .log file:
2023-09-01 16:44:01 47.0 STATUS COMPLETED Methylation Value Distributions - Site Categories
2023-09-01 16:44:01 47.0 STATUS STARTED Sample Clustering
2023-09-01 16:44:01 47.0 STATUS STARTED Agglomerative Hierarchical Clustering
2023-09-01 16:45:04 49.7 STATUS Performed clustering on sites using correlation as a distance metric
2023-09-01 16:46:18 49.7 STATUS Performed clustering on sites using manhattan as a distance metric
2023-09-01 16:47:32 49.7 STATUS Performed clustering on sites using euclidean as a distance metric
2023-09-01 16:47:36 49.7 STATUS Performed clustering on tiling using correlation as a distance metric
2023-09-01 16:47:40 49.7 STATUS Performed clustering on tiling using manhattan as a distance metric
2023-09-01 16:47:45 49.7 STATUS Performed clustering on tiling using euclidean as a distance metric
2023-09-01 16:47:46 49.7 STATUS Performed clustering on genes using correlation as a distance metric
2023-09-01 16:47:46 49.7 STATUS Performed clustering on genes using manhattan as a distance metric
2023-09-01 16:47:47 49.7 STATUS Performed clustering on genes using euclidean as a distance metric
2023-09-01 16:47:47 49.7 STATUS Performed clustering on promoters using correlation as a distance metric
2023-09-01 16:47:48 49.7 STATUS Performed clustering on promoters using manhattan as a distance metric
2023-09-01 16:47:48 49.7 STATUS Performed clustering on promoters using euclidean as a distance metric
2023-09-01 16:47:49 49.7 STATUS Performed clustering on cpgislands using correlation as a distance metric
2023-09-01 16:47:49 49.7 STATUS Performed clustering on cpgislands using manhattan as a distance metric
2023-09-01 16:47:50 49.7 STATUS Performed clustering on cpgislands using euclidean as a distance metric
2023-09-01 16:47:50 49.7 STATUS COMPLETED Agglomerative Hierarchical Clustering
2023-09-01 16:47:52 51.6 STATUS STARTED Clustering Section
2023-09-01 16:47:55 52.7 STATUS STARTED Generating Heatmaps
2023-09-01 16:47:56 52.7 STATUS STARTED Region type: sites
2023-09-01 16:57:02 50.0 STATUS COMPLETED Region type: sites
2023-09-01 16:57:02 50.0 STATUS STARTED Region type: tiling
2023-09-01 17:06:06 46.9 STATUS COMPLETED Region type: tiling
2023-09-01 17:06:06 46.9 STATUS STARTED Region type: genes
2023-09-01 17:15:30 46.9 STATUS COMPLETED Region type: genes
2023-09-01 17:15:30 46.9 STATUS STARTED Region type: promoters
2023-09-01 17:24:43 46.9 STATUS COMPLETED Region type: promoters
2023-09-01 17:24:43 46.9 STATUS STARTED Region type: cpgislands
2023-09-01 17:34:01 46.9 STATUS COMPLETED Region type: cpgislands
2023-09-01 17:34:01 46.9 STATUS Created 540 heatmaps based on the clustering results
2023-09-01 17:34:01 46.9 STATUS COMPLETED Generating Heatmaps
2023-09-01 17:34:01 46.9 STATUS STARTED Adding Color Legends
2023-09-01 17:38:30 46.9 STATUS COMPLETED Adding Color Legends
2023-09-01 17:38:30 46.9 STATUS STARTED Estimating Optimal Numbers of Clusters
2023-09-01 17:39:27 46.9 STATUS Estimated number of clusters based on mean silhouette value
2023-09-01 17:39:27 46.9 STATUS COMPLETED Estimating Optimal Numbers of Clusters
2023-09-01 17:39:27 46.9 STATUS STARTED Overlapping Clusters with Sample Traits
2023-09-01 17:39:27 46.9 STATUS Computed adjusted rand indices and saved to exploratory_analysis_data/adjusted_rand_indices_1.csv
2023-09-01 17:39:27 46.9 STATUS Computed adjusted rand indices and saved to exploratory_analysis_data/adjusted_rand_indices_2.csv
2023-09-01 17:39:27 46.9 STATUS Computed adjusted rand indices and saved to exploratory_analysis_data/adjusted_rand_indices_3.csv
2023-09-01 17:39:27 46.9 STATUS Computed adjusted rand indices and saved to exploratory_analysis_data/adjusted_rand_indices_4.csv
2023-09-01 17:39:27 46.9 STATUS Computed adjusted rand indices and saved to exploratory_analysis_data/adjusted_rand_indices_5.csv
2023-09-01 17:40:23 46.9 STATUS COMPLETED Overlapping Clusters with Sample Traits
2023-09-01 17:40:23 46.9 STATUS COMPLETED Clustering Section
2023-09-01 17:40:23 46.9 STATUS COMPLETED Sample Clustering
2023-09-01 17:40:23 46.9 STATUS COMPLETED Exploratory Analysis
2023-09-01 17:40:24 46.9 INFO Initialized report index and saved to index.html
2023-09-01 17:40:24 46.9 STATUS STARTED Differential Methylation
2023-09-01 17:40:24 46.9 INFO Number of cores: 1
2023-09-01 17:40:24 46.9 STATUS STARTED Analysis
2023-09-01 17:40:24 46.9 INFO Using 0 permutation tests
2023-09-01 17:40:24 46.9 INFO Using columns: Histology
2023-09-01 17:40:24 46.9 INFO Using region types: tiling,genes,promoters,cpgislands
2023-09-01 17:40:24 46.9 STATUS STARTED Retrieving comparison info
I tried attaching the whole file but I cannot - however I'd be happy to copy the contents if someone believes there would be something relevant in there. I have checked all of it though and I had no errors whatsoever and everything seemed to be running smoothly. I have the .html for quality control, exploratory analysis, etc., but when it arrives to differential methylation, it halts.
Since it says "not all names are present in the annotation", this is what makes me think that maybe it is a naming convention issue, but I really cannot see what is the issue with it, so I'd appreciate suggestions on how to figure out this issue!
Error in rnb.sample.groups(x, columns = pheno.cols, columns.pairs = columns.pairs) :
invalid value for columns.pairs; not all names are present in the annotation
Thanks,
Alejandra