Question

Error encountered while running stattest from ballgown package

1

Entering edit mode

7.6 years ago

lakhujanivijay 5.9k

Command

results_transcripts=stattest(bg_filtered, feature="transcript", covariate="treatment", adjustvars=c("variety"), getFC=TRUE, meas="FPKM")

Error message:

Error in solve.default(t(mod) %*% mod) :   system is computationally singular: reciprocal condition number = 6.66115e-17

my mapping file:

pheno_data

    ids       treatment variety
1 112-C         control     112
2 112-S          stress     112
3 111-C         control     111
4 111-S          stress     111

Commands ran before running stattest:

bg = ballgown(dataDir=data_directory, samplePattern='11', meas='all', pData=pheno_data)
library(genefilter)
bg_filtered=subset(bg, "rowVars(texpr(bg)) > 1", genomesubset=TRUE)

SessionInfo

sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS release 6.6 (Final)

locale:
 [1] LC_CTYPE=en_US.utf8       LC_NUMERIC=C
 [3] LC_TIME=en_US.utf8        LC_COLLATE=en_US.utf8
 [5] LC_MONETARY=en_US.utf8    LC_MESSAGES=en_US.utf8
 [7] LC_PAPER=en_US.utf8       LC_NAME=C
 [9] LC_ADDRESS=C              LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] genefilter_1.56.0 ballgown_2.6.0

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.9                AnnotationDbi_1.36.1
 [3] XVector_0.14.0             GenomicRanges_1.26.2
 [5] BiocGenerics_0.20.0        splines_3.3.1
 [7] zlibbioc_1.20.0            GenomicAlignments_1.10.0
 [9] IRanges_2.8.1              BiocParallel_1.8.1
[11] xtable_1.8-2               lattice_0.20-34
[13] GenomeInfoDb_1.10.2        tools_3.3.1
[15] SummarizedExperiment_1.4.0 parallel_3.3.1
[17] grid_3.3.1                 nlme_3.1-128
[19] Biobase_2.34.0             mgcv_1.8-12
[21] DBI_0.5-1                  survival_2.40-1
[23] digest_0.6.12              Matrix_1.2-6
[25] sva_3.22.0                 rtracklayer_1.34.1
[27] RColorBrewer_1.1-2         S4Vectors_0.12.1
[29] bitops_1.0-6               RCurl_1.95-4.8
[31] memoise_1.0.0              RSQLite_1.1-2
[33] limma_3.30.9               Biostrings_2.42.1
[35] Rsamtools_1.26.1           stats4_3.3.1
[37] XML_3.98-1.5               annotate_1.52.1

Troubleshooting from my side:

This is a known error (https://github.com/alyssafrazee/ballgown/issues/67), so here are the things I checked:

There are indeed some transcripts where the FPKM values are zero; e.g

checking for column1 and column2

any(texpr(bg_filtered)[,1]==0 & texpr(bg_filtered)[,2]==0)
[1] TRUE

checking for column3 and column4

any(texpr(bg_filtered)[,3]==0 & texpr(bg_filtered)[,4]==0)
[1] TRUE

checking for all the columns

any(texpr(bg_filtered)[,1]==0 & texpr(bg_filtered)[,2]==0 & texpr(bg_filtered)[,3]==0 & texpr(bg_filtered)[,4]==0)
[1] FALSE

* i.e. there are no rows where FPKM for all the samples is zero.

bioconductor RNA-Seq ballgown stattest • 4.6k views

ADD COMMENT • link updated 19 months ago by Ram 44k • written 7.6 years ago by lakhujanivijay 5.9k

0

Entering edit mode

Is your design correct? By looking at the ids you use, I think it's supposed to be:

    ids       treatment variety
1 112-C         control     **112**
2 112-S          stress     112
3 111-C         control     111
4 111-S          stress     111

ADD REPLY • link 7.6 years ago by WouterDeCoster 47k

0

Entering edit mode

My bad! that was just a typo while posting on Biostars. Not a problem with the actual file I am using. Just corrected my post!

ADD REPLY • link 7.6 years ago by lakhujanivijay 5.9k

0

Entering edit mode

Would be nice to mention that you crossposted this to https://support.bioconductor.org/p/95379/
Cross-posting is discouraged, because as such two communities will spend time trying to help you.

ADD REPLY • link 7.6 years ago by WouterDeCoster 47k

0

Entering edit mode

In the hope that somebody would reply as early as possible, yes I did infact posted it on Bioconductor. Deleted now on bioconductor. Any insights to the problem?

ADD REPLY • link 7.6 years ago by lakhujanivijay 5.9k

0

Entering edit mode

I can understand why you would post it on both biostars and bioconductor support, but at least be honest and mention that.

My first test would be to simplify the design and drop the 'variety' column and see if the error is reproduced. I think your design doesn't allow estimation of 'influence' of the variety covariate on the expression levels since you don't have replicates.

You know that your number of replicates is rather small?

ADD REPLY • link 7.6 years ago by WouterDeCoster 47k

0

Entering edit mode

Ok, thank you! Next time, I will try not to cross post or at least will disclose that I did that in the very beginning.

Answering related to the problem, in fact I do not have any replicates (I know I should have at least 3 for my results to be statistically significant). But, I can ignore this at this point just for the understanding.

I will give it a try as you said to confirm that this is indeed a problem with the number of replicates.

ADD REPLY • link 7.6 years ago by lakhujanivijay 5.9k

0

Entering edit mode

Unfortunately, dropping it does help in a rather unusual way; i.e. somehow ballgown interpret rest of the 2 samples as replicates and works! However, as soon as I try the same by creating a mapping file for only 2 samples (1 control and 1 treated) and loading those 2 samples as a ballgown object, it fails at the same step saying that

Error in stattest(bg_filtered, feature = "transcript", covariate = "treatment",  : 
  There must be at least two replicates per group. Make sure covariate is categorical; if continuous, consider the timecourse option, or specify your own models with mod and mod0.

At this point, I think that it is not possible to go on with ballgown without any replicates.

Just for understanding, what if I create pseudo-replicates by duplicating the same samples and try to import them. Technically speaking it should work but what could be the implications from statistical point of view?

ADD REPLY • link 7.6 years ago by lakhujanivijay 5.9k

0

Entering edit mode

You don't have replicates when you use this matrix:

    ids       treatment variety
1 112-C         control     112
2 112-S          stress     112
3 111-C         control     111
4 111-S          stress     111

Because your 4 real groups are, as judged by the program:

    ids       group 
1 112-C         control112
2 112-S          stress112
3 111-C         control111
4 111-S          stress111

ADD REPLY • link 7.6 years ago by WouterDeCoster 47k

0

Entering edit mode

Hi, Did you solve it? I have two samples and I just duplicated the file because we should have at least two replicates. But, I am getting the same error. Can you show me?

ADD REPLY • link 3.5 years ago by HaroonPakistan • 0

1

Entering edit mode

HaroonPakistan Unfortunately, I could not resolve the issue. I ended up using another tool in this case.

ADD REPLY • link 3.5 years ago by lakhujanivijay 5.9k