Hi, I'm trying to run a bioconductor variant annotation workflow, similar to that described here but I'm having trouble running my code in parallel over data from multiple patients' samples.
I've narrowed the problem down to that shown below; effectively I can't call locateVariants()
within an mclapply
call, presumably because the calls try to use the same database handle in parallel (apologies If I've misunderstood the problem, and indeed if it's trivial). I was wondering whether there is a simple way to implement this in parallel within R.
## testcode
library('VariantAnnotation')
library('TxDb.Hsapiens.UCSC.hg19.knownGene')
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
si <- Seqinfo(
seqnames = names(genome(txdb)),
genome = genome(txdb))
pos.pt1 <- GRanges(
seqnames = c('chr1', 'chr2', 'chr3'),
ranges = IRanges(start = rep(10^6, 3), end = rep(10^6, 3)),
seqinfo = si
)
pos.pt2 <- GRanges(
seqnames = c('chr11', 'chr12', 'chr13'),
ranges = IRanges(start = rep(10^6, 3),
end = rep(10^6, 3)),
seqinfo = si
)
pos.list <- list(pt1 = pos.pt1, pt2 = pos.pt2)
lapply(pos.list, function(gr){
locateVariants(gr, txdb, AllVariants())
}) # runs fine, aside from a couple of warnings
mclapply(pos.list, function(gr){
locateVariants(gr, txdb, AllVariants())
}) # fails with error:
Warning message:
In mclapply(pos.list, function(gr) { :
scheduled core 1 encountered error in user code, all values of the job will be affected
and result (the result for patient 2 was as in the lapply version)..
$pt1
[1] "Error in sqliteFetch(rs, n = -1) : \n rsqlite_query_fetch: failed: database disk image is malformed\n"
attr(,"class")
[1] "try-error"
attr(,"condition")
<simpleError in sqliteFetch(rs, n = -1): rsqlite_query_fetch: failed: database disk image is malformed>
All the best
Russ
Liverpool
FYI: don't use sqlite3 with NFS: http://www.sqlite.org/faq.html