Question

DEXSeq run does not produce results

0

Entering edit mode

9.9 years ago

adnan.niazi • 0

Hi all,

I am having problem in diff. exon usage analysis of my dataset. I generated count data using protocol mentioned in DEXSeq's manual. My runs seem to be halted after some time when DEXSeq function is called. I have also tried manual steps which are being called in DEXSeq function to see where the problem is. And it seems that estimateDispersions() has to o something with this problem. I am running my job on cluster with 16 cores (512GB) for 48 hours but no progress. The output remained the same as shown below without any changes after first two hours of the run. Below is the sample of my counts data, annotation file and script output produced after 48 hours. Can anybody help me resolving the issue or what I am doing wrong here?

COUNT FILE

ENSG00000000003:001    105
ENSG00000000003:002    145
ENSG00000000003:003    139
ENSG00000000003:004    136
ENSG00000000003:005    139
ENSG00000000003:006    17
ENSG00000000003:007    33
ENSG00000000003:008    46
ENSG00000000003:009    185
.
.
.
_ambiguous    9946
_ambiguous_readpair_position    0
_empty    9520711
_lowaqual    0
_notaligned    0

FLAT FILE:

chr1    dexseq_prepare_annotation.py    aggregate_gene    29554    31109    .    +    .    gene_id "ENSG00000243485"
chr1    dexseq_prepare_annotation.py    exonic_part    29554    30039    .    +    .    transcripts "ENST00000473358"; exonic_part_number "001"; gene_id "ENSG00000243485"
chr1    dexseq_prepare_annotation.py    exonic_part    30267    30365    .    +    .    transcripts "ENST00000469289"; exonic_part_number "002"; gene_id "ENSG00000243485"
chr1    dexseq_prepare_annotation.py    exonic_part    30366    30503    .    +    .    transcripts "ENST00000607096+ENST00000469289"; exonic_part_number "003"; gene_id "ENSG00000243485"
chr1    dexseq_prepare_annotation.py    exonic_part    30504    30563    .    +    .    transcripts "ENST00000469289"; exonic_part_number "004"; gene_id "ENSG00000243485"
chr1    dexseq_prepare_annotation.py    exonic_part    30564    30667    .    +    .    transcripts "ENST00000469289+ENST00000473358"; exonic_part_number "005"; gene_id "ENSG00000243485"
chr1    dexseq_prepare_annotation.py    exonic_part    30976    31097    .    +    .    transcripts "ENST00000469289+ENST00000473358"; exonic_part_number "006"; gene_id "ENSG00000243485"
chr1    dexseq_prepare_annotation.py    exonic_part    31098    31109    .    +    .    transcripts "ENST00000469289"; exonic_part_number "007"; gene_id "ENSG00000243485"

OUTPUT script.Rout

R version 3.2.1 (2015-06-18) -- "World-Famous Astronaut"
Copyright (C) 2015 The R Foundation for Statistical Computing
Platform: x86_64-redhat-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

Creating a generic function for 'nchar' from package 'base' in package 'S4Vectors'
[Previously saved workspace restored]

>
> #source("https://bioconductor.org/biocLite.R")
> #biocLite("DEXSeq")
> #pythonScriptsDir = system.file( "python_scripts", package="DEXSeq" )
> #list.files(pythonScriptsDir)
> #/Library/Frameworks/R.framework/Versions/3.2/Resources/library/DEXSeq/python_scripts
>
> library(DEXSeq)
Loading required package: BiocParallel
Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following object is masked from 'package:stats':

    xtabs

The following objects are masked from 'package:base':

    anyDuplicated, append, as.data.frame, as.vector, cbind, colnames,
    do.call, duplicated, eval, evalq, Filter, Find, get, intersect,
    is.unsorted, lapply, Map, mapply, match, mget, order, paste, pmax,
    pmax.int, pmin, pmin.int, Position, rank, rbind, Reduce, rep.int,
    rownames, sapply, setdiff, sort, table, tapply, union, unique,
    unlist, unsplit

Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: IRanges
Loading required package: S4Vectors
Loading required package: stats4
Loading required package: GenomicRanges
Loading required package: GenomeInfoDb
Loading required package: DESeq2
Loading required package: Rcpp
Loading required package: RcppArmadillo
> inDir = "dexseq_counts"
> files_fc <- list.files(inDir, pattern = "*fc", full.names = TRUE)  #   ".tab$"
>
> #-----------------------------Creating meta-data-----------------------------------------------#
>
> sampleTable = data.frame(row.names = c("cer_fetal_cyto_063_4","cer_fetal_cyto_063_6","cer_fetal_cyto_063_8", "cer_fetal_nuc_064_4","cer_fetal_nuc_064_6","cer_fetal_nuc_064_8"),
+                          condition = c("cyto", "cyto", "cyto", "nuc", "nuc", "nuc"),
+                          individual = c( "1", "2", "3", "1", "2", "3"))
>
> #-------------------------Running DEXSeq---------------------------------------#
>
> BPPARAM = MulticoreParam(workers=16)
>
> fullmodel= ~ sample + exon + individual:exon + condition:exon
> reduced_model = ~ sample + exon + individual:exon
> flatfile = "Homo_sapiens.GRCh37.75_protein_lincRNA.DEXSeq.chr.gff"
>
> dxd_fc = DEXSeqDataSetFromHTSeq(files_fc,sampleData=sampleTable, design= fullmodel, flattenedfile = flatfile)
converting counts to integer mode
> dxr_fc = DEXSeq(dxd_fc, reducedModel = reduced_model,BPPARAM = BPPARAM, quiet = FALSE)
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix

Regards,
adnan

RNA-Seq Exon-usage DEXseq • 3.5k views

ADD COMMENT • link updated 2.9 years ago by Ram 45k • written 9.9 years ago by adnan.niazi • 0

1

Entering edit mode

is it still running?

ADD REPLY • link 9.9 years ago by Martombo ★ 3.2k

0

Entering edit mode

You might check to see if the job is getting killed automatically. The thing with clusters is that sometimes they have limits on how long a job can run. Perhaps this is just getting killed early and the notice of that just isn't getting written to a file. So do as Martombo suggested and check if it's still running and, if not, ask the cluster admin if the job is getting killed. If it is still running, you might be able to ask the admin to check and see if it's actually using the allocated cores or if it seems to just be sitting there and doing nothing (in which case he/she can just kill it).

ADD REPLY • link updated 2.9 years ago by Ram 45k • written 9.9 years ago by Devon Ryan 105k