Question

DESeq: obtaining DEGs from uneven dataset (some with replicates, some without)

0

Entering edit mode

6.4 years ago

exin ▴ 60

I'm trying to perform differential gene expression analyses using DESeq, on a pre-existing dataset, where 8 cell types (A to H) were identified from MARS-Seq on whole organism. Gene expression counts are given for

A x 9 replicates
B x 1
C x 1
D x 3
E x 1
F x 1
G x 4
H x 1

In order to find genes differentially expressed in A, I contrast A against "notA" (replacing B to H with "notA")

dds<-DESeqDataSetFromMatrix(countData = cts, colData = coldata,design=~condition)
dds$condition<-factor(dds$condition,levels=c("A","notA"))
dds<-DESeq(dds)
alpha<-0.1
res<-results(dds,contrast=c("condition","A","notA"),alpha=alpha) 
reslfc<-lfcShrink(dds,contrast=c("condition","A","notA"),res=res, alpha=alpha)

and so on for the other 7 cell types.

However, I found that I get much more DEGs for A D & G, i.e. samples with more reps. In fact, I have no DEGs at all for B, C E, F and H.

Anyway I can overcome this? Or should I not be performing the analysis this way?

rna-seq gene expression • 1.6k views

ADD COMMENT • link 6.4 years ago by exin ▴ 60

1

Entering edit mode

Statistical power is (among other factors) a function of the replication number. Please use google and the search function for DESeq2-related questions on replication, replicate numbers and experiments with no replicates. You'll literally find dozens of posts, blogs, bioconductor questions and responses from (among others) Mike Love (the first author and maintainer of DESeq2).

ADD REPLY • link 6.4 years ago by ATpoint 88k

0

Entering edit mode

Thanks ATpoint. Besides treating it as an exploratory analysis / not drawing firm conclusions from the dataset, I'm wondering how exactly to augment the test / result, by accepting the assumption that variance-mean estimates from replicated samples can be applied to unreplicated ones?

ADD REPLY • link 6.4 years ago by exin ▴ 60

0

Entering edit mode

I think (but I am really not sure) that DESeq2 uses the dispersion estimates between the replicated group and projects them to the unreplicated one, but again, this is only something I faintly remember from a bioconductor post so better double check this.

ADD REPLY • link 6.4 years ago by ATpoint 88k