Question

Sinking into the swamp by Using DESeq without replicates !!

0

Entering edit mode

10.6 years ago

Whoknows ▴ 960

Hi friends

I have a project on Rice and my conditions (paired-end) unfortunately have no replicate.

I used tophat2 and after conversion to sam file is used HTSeq and I merged HTSeq output.

And today the bigest problem emereged, How can i handle DeSeq without replicate you can see my following codes:

pasillaCountTable <- read.table("MyCountTable.txt")
pasillaDesign = data.frame(row.names = colnames( pasillaCountTable ),condition = c( "C1","S1"),libType = c( "paired-end", "paired-end" ))
pairedSamples = pasillaDesign$libType == "paired-end"
countTable = pasillaCountTable[ , pairedSamples ]
condition = pasillaDesign$condition[ pairedSamples ]
cds = newCountDataSet( countTable, condition )
cds = estimateSizeFactors( cds )

and main part was, is it fine or not?

cds = estimateDispersions( cds, method="blind", sharingMode="fit-only" ,fitType="local")

and rest of the codes:

res_C1S1 = nbinomTest( cds, "C1", "S1" )
write.table(res_C1S1,"C1_S1.txt",sep= "\t")

Thanks a lot.

HTSeq DESeq RNA-Seq Tophat • 4.0k views

ADD COMMENT • link updated 3.3 years ago by Ram 45k • written 10.6 years ago by Whoknows ▴ 960

2

Entering edit mode

If you want to do analysis of differentially expressed genes, that just won't be possible without replicates. If you don't know what's the variance within the samples from the same group (either condition or control), you cannot possibly tell if the difference you see between different groups (condition vs control) is significant or not. You may try to take a look at dispersion values from the very similar experiment if such data is available.

ADD REPLY • link 10.6 years ago by Biomonika (Noolean) 3.2k

score 2 · Answer 1 · 2014-09-17

You just can't do a statistical analysis of differential expression without replicates. Basically all you can do is calculate the ratio between your sample and control, and then rank these, but it gives you no measure of confidence in the results, for obvious reasons. You can use DEseq to perform the normalization step with size factors, then calculate the log2 ratios with your normalized counts with just a bit of R code (you don't need DEseq for this part). But what this will give you, at best, is a list of top potential genes changing expression - you can filter by the magnitude of the log2 fold change and how highly expressed the genes are to begin with - there's usually a lot of variability in the low expressed genes. After that, you either need to do a ton of biological validation, or get a few more replicates!

Ram · Answer 2 · 2014-09-17

I'd start to look how your transcripts are grouping together (the package "MBCluster.Seq" works great for this). Depending on the comparisons you want to make, an estimated dispersion based on these groups can be used as a prior for inference/significance testing (whether NB, logl, or EM), e.g., prior.dispersion = mu + mu^2*d. Further, I would make it a point to work with the raw count data using library-size as your normalizing factor and filtering low-expressed transcripts before any testing or normalization takes place - in addition, transcript filtering post-normalization may introduce more bias into your analysis. At any rate, if you want to feel better about not having any replicates, try integrating more information as covariates into your analysis, e.g., gender, tissue, k-means, etc. This may help explain some of the variance confounded as a result of prior dispersion estimates. I'm a big fan of the package "edgeR" when it comes to RNA-Seq analysis - it gives you an option of using common and tag-wise dispersion estimates as well as a bit more intuitive script.

Cheers!