Question

DESEQ multifactor specific sample differentials

0

Entering edit mode

10.5 years ago

justinfanyi • 0

I have multifactor deseq design analysis (24 samples in total), without replicates (unfortunately), the collaborators however are aware about the confidence of the results but they are ready to do q-PCR of the highest and lowest fold changes counts from the deseq to validate since money has already been spent to sequence(big mistake)They want kind of indicators on DEGs.

The question I want to answer is which genes are involved in early molecular interactions after treatment,by extend find sample specific differentially expressed genes and common between two samples, A as base. The samples are (A ,B,C,D,E,F) each from different rice species (different resistance levels)

Sample of results expected:

DEG A vs. B at 24 hour after treatment
DEG A vs. B at 48 hour after treatment
….............................................................
….........................................................

Am clad deseq is still supported according to the manual update(http://bioconductor.org/packages/release/bioc/manuals/DESeq/man/DESeq.pdf), deseq2 is good but cant find clear direction on how to analyze samples without replicates

So far,this is what I have done(below-shorten) and am stuck on extraction DEG between two individual samples i.e A_24T vs. B_24T (DEG in sample A vs. sample B after treatment 24 hours)not sure if this how best DEG query is framed so as to extract counts data with foldchange > 2 between samples.

Kindly help me guys, thanks.

> library(DESeq)

> countsTable<-read.table("/home/biostat1/DESEQ2_2015/seqtable.txt",header=T,row.names=1)

> head(countsTable)

A_T24 B_T24 C_T24 D_T24 E_T24 F_T24 A_C24 B_C24 C_C24

ChrSy.fgenesh.exon.1 1 1 1 1 1 1 1 1 2

…..........................................................................................................................
…..........................................................................................................................

D_C24 E_C24 F_C24 A_T48 B_T48 C_T48 D_T48 E_T48 F_T48

ChrSy.fgenesh.exon.1 1 1 1 1 2 1 1 1 1

…..........................................................................................................................
…..........................................................................................................................

A_C48 B_C48 C_C48 D_C48 E_C48 F_C48

ChrSy.fgenesh.exon.1 1 1 1 1 1 1

............................................................................................................................
…..........................................................................................................................

> design <-read.table("/home/biostat1/DESEQ2_2015/seqdesign.txt",header=T,row.names=1)

> head(design)

infection time

A_T24 Treatment 24

B_T24 Treatment 24

….....................................

A_C24 Control 24

B_C24 Control 24
…......................................

….......................................

A_T48 Treatment 48

B_T48 Treatment 48

> design<- as.data.frame(design)

> cds <- newCountDataSet( countsTable, design)

> head(counts(cds))

A_T24 B_T24 C_T24 D_T24 E_T24 F_T24 A_C24 B_C24 C_C24

ChrSy.fgenesh.exon.1 1 1 1 1 1 1 1 1 2

….................................................................................................................................

> cds <- estimateSizeFactors ( cds )

> sizeFactors( cds)

A_T24 B_T24 C_T24 D_T24 E_T24 F_T24 A_C24 B_C24

0.9715319 0.9715319 0.9280621 0.9438743 0.9552564 0.9715319 0.9715319 0.9438743

…....................................................................................................................................

> cds = estimateDispersions(cds, method="blind", sharingMode="fit-only", fitType="local")

> fit1 = fitNbinomGLMs( cds, count ~ time + infection )

......................................................................................

> fit0 = fitNbinomGLMs( cds, count ~ time )

......................................................................................

> pvalsGLM = nbinomGLMTest( fit1, fit0 )

> padjGLM <- p.adjust (pvalsGLM, method = "BH")

> res <- cbind(fit1,pvalsGLM,padjGLM)

> resSig <- res[ res$padjGLM < 0.05, ]

Now stuck to query

1. DEG A vs. B at 24 hour after treatment??

….........................................................

RNA-Seq DESeq2 • 4.0k views

ADD COMMENT • link updated 3.3 years ago by Ram 45k • written 10.5 years ago by justinfanyi • 0

0

Entering edit mode

You might remind the collaborators that it's more expensive to do qPCR validation than to simply sequence more samples. They're just wasting time and money with this approach.

ADD REPLY • link updated 3.3 years ago by Ram 45k • written 10.5 years ago by Devon Ryan 105k

0

Entering edit mode

Thanks Devon for your invaluable advice. All species are included in the design, just highlighted some because of space

I tried subset cds but I got the follow error:

> results = nbinomTest(cds, "A_T24", "B_T24")
Error in if (dispTable(cds)[condA] == "blind" || dispTable(cds)[condB] ==  :
  missing value where TRUE/FALSE needed

I cannot troubleshoot, am kind of new in this field of RNASeq.

> sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)

other attached packages:
[1] DESeq_1.16.0        lattice_0.20-29     locfit_1.5-9.1
[4] Biobase_2.26.0      BiocGenerics_0.12.1

Regards

ADD REPLY • link updated 3.3 years ago by Ram 45k • written 10.5 years ago by justinfanyi • 0

0

Entering edit mode

No, none of the species are included in your design. You included all of the samples, but no species information. Please reread the DESeq documentation. Also, the command you showed will never work, "A_T24" and "B_T24" don't exist in your design (the samples do, but the model doesn't have those coefficients).

ADD REPLY • link updated 3.3 years ago by Ram 45k • written 10.5 years ago by Devon Ryan 105k

0

Entering edit mode

Thanks for your quick response,perhaps have not made my question clear! I want to get DEG for specific samples against control (A) on particular time(24/48) after treatment. The samples(A_T24) for example is species of rice,the rest the same but from different species.

> design
      infection time
A_T24 Treatment   24
B_T24 Treatment   24
C_T24 Treatment   24
D_T24 Treatment   24
E_T24 Treatment   24
F_T24 Treatment   24
A_C24   Control   24
B_C24   Control   24
C_C24   Control   24

How can I include species offset (A, B, C, ...) like you have suggested yet they have different time points 24 or 48,treatment or control?

i.e results = nbinomTest(cds, "A", "B") how will I know is from 24 hour or 48 hour results after treatment!!

Kind regards.

ADD REPLY • link updated 3.3 years ago by Ram 45k • written 10.5 years ago by justinfanyi • 0

0

Entering edit mode

I already understood that. As I mentioned previously, you need to add species to your design:

> design
      infection time   species
A_T24 Treatment   24   A
B_T24 Treatment   24   B
C_T24 Treatment   24   C
D_T24 Treatment   24   D
E_T24 Treatment   24   E
F_T24 Treatment   24   F
A_C24   Control   24   A
B_C24   Control   24   B
C_C24   Control   24   C

ADD REPLY • link updated 3.3 years ago by Ram 45k • written 10.5 years ago by Devon Ryan 105k

0

Entering edit mode

Thanks Devon, the results I got is still the same after adding species!!

> results = nbinomTest(cds, "A", "B")
Error in if (dispTable(cds)[condA] == "blind" || dispTable(cds)[condB] ==  :
  missing value where TRUE/FALSE needed

> design
      infection time species
A_T24 Treatment   24       A
B_T24 Treatment   24       B
C_T24 Treatment   24       C
D_T24 Treatment   24       D
E_T24 Treatment   24       E
F_T24 Treatment   24       F
A_C24   Control   24       A
B_C24   Control   24       B
C_C24   Control   24       C
D_C24   Control   24       D
E_C24   Control   24       E
F_C24   Control   24       F
A_T48 Treatment   48       A
B_T48 Treatment   48       B
C_T48 Treatment   48       C
D_T48 Treatment   48       D
E_T48 Treatment   48       E
F_T48 Treatment   48       F
A_C48   Control   48       A
B_C48   Control   48       B
C_C48   Control   48       C
D_C48   Control   48       D
E_C48   Control   48       E
F_C48   Control   48       F

Could it be because there will be two sets of (A,B,C....) because of time...24 and 48? two much multifactors I case!!

Thanks once again for your help.

ADD REPLY • link updated 3.3 years ago by Ram 45k • written 10.5 years ago by justinfanyi • 0

0

Entering edit mode

Please stop posting your replies as answers. There are "ADD REPLY" and "ADD COMMENT" buttons for a reason.
What I showed was a way of including species information. This won't result in too many "multifactors". This isn't how it would make sense to specify things (you could, but you wouldn't find it easy to get the results you want), however. Instead, it would make more sense to just use the row names as the design.

ADD REPLY • link updated 3.3 years ago by Ram 45k • written 10.5 years ago by Devon Ryan 105k

0

Entering edit mode

"ADD REPLY" and "ADD COMMENT" buttons are inactive with working (link) forwarding to (add your answer), anyway thanks a lot for your insights, I will consider doing away with multifactor altogether then do pairwise! because my previous attempt of row names as design failed(results = nbinomTest(cds, "A_T24", "B_T24"))

Kind regards

ADD REPLY • link updated 3.3 years ago by Ram 45k • written 10.5 years ago by justinfanyi • 0

Ram · Answer 1 · 2015-02-10

0

Entering edit mode

10.5 years ago

Devon Ryan 105k

nbinomGLMTest() is performing a likelihood ratio test, so it's not producing the results for any of the pair-wise comparisons that you want. You probably want the nbinomTest() function instead. Alternatively, just subset cds after estimating dispersion.

BTW, you need to change your design. You haven't included a species offset (A, B, C, ...).

ADD COMMENT • link updated 3.3 years ago by Ram 45k • written 10.5 years ago by Devon Ryan 105k