When running the exploratory data analysis step (rnb.run.exploratory
) I get an error message in the heatmap creation step
2020-06-02 03:26:26 62.5 STATUS COMPLETED Agglomerative Hierarchical Clustering
2020-06-02 03:26:29 62.5 STATUS STARTED Clustering Section
2020-06-02 03:26:36 62.5 STATUS STARTED Generating Heatmaps
2020-06-02 03:26:38 62.5 STATUS STARTED Region type: sites
Xlib: request 18 length 24 would exceed buffer size.
*** caught segfault ***
address 0x4, cause 'memory not mapped'
Traceback:
1: dev.off(dev.copy(device = pdf, file = tmp, width = width, height = height, pointsize = pointsize, paper = "special", ...))
2: dev2bitmap(fname, type = "pngalpha", height = .Object@height, width = .Object@width, method = "pdf", ...)
3: doTryCatch(return(expr), name, parentenv, handler)
4: tryCatchOne(expr, names, parentenv, handlers[[1L]])
5: tryCatchList(expr, names[-nh], parentenv, handlers[-nh])
6: doTryCatch(return(expr), name, parentenv, handler)
7: tryCatchOne(tryCatchList(expr, names[-nh], parentenv, handlers[-nh]), names[nh], parentenv, handlers[[nh]])
8: tryCatchList(expr, classes, parentenv, handlers)
9: tryCatch(dev2bitmap(fname, type = "pngalpha", height = .Object@height, width = .Object@width, method = "pdf", ...), warning = function(e) { if (grepl(" had status 1$", e$message)) { doerror(e) } else if (logger.isinitialized()) { logger.warning(e$message) } else { invisible(e$message) }}, error = doerror)
10: convert.f(fname, res = .Object@low.png)
11: .local(.Object, ...)
12: off(rplot)
13: off(rplot)
14: rnb.section.clustering.add.heatmap(report, X, fname, TRUE, clust.result, sample.ids, locus.colors.cur, sample.colors)
15: rnb.section.clustering(report, rnb.set, clust.results, rinfo, clust.edited)
16: rnb.step.clustering.internal(rnb.set, report, rinfos)
17: rnb.run.exploratory(rnb.set = rnb.set, dir.reports = report.dir)
Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection:
I have tried several times with different number of cores, as I thought this might be the problem. It doesn't work.
The parameters for the run are below.
Any ideas what the problems are in this run?
I have done the same run before using the command for the complete run, though several parameters were also not there
the command was
rnb.run.analysis(dir.reports = report.dir, sample.sheet = sample_annotations,
data.dir = bed.dir, data.type = "bs.bed.dir",
build.index = TRUE,save.rdata = TRUE, initialize.reports = TRUE)
The parameters I added in the step-by-step analysis are
# filtering step
filtering.coverage.threshold = 5,
filtering.low.coverage.masking = TRUE,
filtering.greedycut = FALSE,
filtering.missing.value.quantile = 0.5,
filtering.high.coverage.outliers = TRUE,
# analyzed regions
region.types = c("genes", "promoters", "tiling1kb", "ensembleRegBuildBPall", "tiling200bp", "gencode22promoters", "gencode22genes"),
Is this error explainable? Does it has anything to do with the filtering steps? or with the new regions I added?
Is there a package missing?
Thanks
Assa
<h5>Parameters for the run</h5>num.cores = 12
parallel.setup(num.cores)
rnb.options(
analysis.name = "Arno A. Bi-Sulphite Seq data analysis local step-by-step",
email = "yeroslaviz@biochem.mpg.de",
identifiers.column = "sampleID",
replicate.id.column = "treatment",
import.bed.style = "bismarkCov",
assembly = "hg38",
# Finetune color scales and plot themes.
colors.meth = c("#EDF8B1","#41B6C4","#081D58"),
colors.category = c("#1B9E77","#D95F02","#7570B3","#E7298A","#66A61E",
"#E6AB02","#A6761D","#666666","#2166AC","#B2182B",
"#00441B","#40004B","#053061"),
qc.coverage.plots = TRUE,
qc.coverage.histograms = TRUE,
qc.coverage.violins = TRUE,
# filtering step
filtering.coverage.threshold = 5,
filtering.low.coverage.masking = TRUE,
filtering.greedycut = FALSE,
filtering.missing.value.quantile = 0.5,
filtering.high.coverage.outliers = TRUE,
# Surrogate variables factor analysis (Covariates)
inference = TRUE,
inference.targets.sva = "treatment", # Column names in the sample annotation table for which surrogate variable analysis (SVA) should be conducted.
inference.sva.num.method = "be", #What function should be used to estimate the number of surrogate variables.
differential.comparison.columns = c("WT_GID4", "WT_MAEA", "GID4_MAEA"), # Column names in the sample annotation table to be used for group definition in the differential methylation analysis.
differential.comparison.columns.all.pairwise = c("WT_GID4", "WT_MAEA", "GID4_MAEA"), # Column names in the sample annotation table to be used for group definition in the differential methylation analysis in which all pairwise comparisons between groups should be conducted (the default is "one vs all" if multiple groups are specified in a column).
region.types = c("genes", "promoters", "tiling1kb", "ensembleRegBuildBPall", "tiling200bp", "gencode22promoters", "gencode22genes"), # Region types to carry out analysis on (this would remove the analysis for sites [done only to shorten the process.])
differential.site.test.method = "limma", # Method to be used for calculating p-values on the site level.
differential.variability = TRUE, # With this analysis, the variance inside each group is sused to detect differences among them.
differential.variability.method = "diffVar",
differential.enrichment.go = TRUE, # whether Gene Ontology (GO)-enrichment analysis is to be conducted on the identified differentially methylated regions.
differential.enrichment.lola = TRUE, # whether LOLA-enrichment analysis is to be conducted on the identified differentially methylated regions.
differential.enrichment.lola.dbs = c("${LOLACore}"), # database for LOLA enrichment analysis
export.to.trackhub = c("bigBed","bigWig"), # create tracks and hub structure
export.to.csv = TRUE,
# several parameters for better memory management and parallel processing
disk.dump.big.matrices = TRUE,
disk.dump.bigff = TRUE,
logging.disk = TRUE,
enforce.memory.management = TRUE
)
theme_set(theme_bw())
Thanks for the fast response
I don't think this is a RAM problem
I'll try with a new session and if this not work, I'll deactivate the heatmaps. But the problem is it did work when running the previous command
rnb.run.analysis
. Why is it different now?Hi again,
I'm not sure, but it looks strange to me. When I run the separate commands for the clustering it works fine.
If I understand the commands from the log files, this is what caused the problems before? Isn't it?
Hi. The issue is probably in the plotting itself, and not in the computations. In the execute commands, there is not plotting involved, while in the run commands there is. Plotting heatmaps can be a challenging tasks on its own, in case there are many CpGs/regions to be plotted. I guess this is what causes the error.
So let's say I would like to have the heatmaps. what command do I use? Or Can I only do it within R?
Good Morning, another thing I can't understand is that the run was done without errors, when running the whole pipeline with the
rnb.run.analysis
command, so why does it causes problems now?Hi! to your questions:
rnb.run.analysis
command, but not for the exploratory module individually, I don't know. I don't event have a clue what the issue might be, except for some unexpected memory issues. I would have to have the dataset at hand to reproduce the error, in order to solve it.ok, thanks,
The heatmaps via
meth()
andheatmap.
I already managed for some. To make the hatmaps one must make sure, that the matrix contains onlycomplete.cases()
, asheatmap.2
can't handleNaN
.For the "sites" data I get the error -
Error: cannot allocate vector of size 2974245.6 Gb
so I guess there are also some memory problems. It still doesn't explain, why it worked before.I'll try to re-run the analysis in a step-by-step manner again, but with the same parameters as before (no filtering, etc.). Hopefully I can get something different.