Question

Deseq - Should I Include Only The Samples Being Compared?

4

Entering edit mode

11.2 years ago

jomaco ▴ 200

Hi,

In DESeq, If I have four samples - A, B, C, D - and only wish to compare A with B and C with D to test for differential expression, should I include all the samples in the analysis?

By this I mean, when performing the normalisation steps, will including the other 2 samples change how the data is normalised (given that the library sizes may be different for example)?

Thanks,

Jom.

deseq rna-seq • 2.9k views

ADD COMMENT • link updated 23 months ago by Ram 44k • written 11.2 years ago by jomaco ▴ 200

0

Entering edit mode

So you had 4 different treatments for basically the same samples?

ADD REPLY • link 11.2 years ago by Biomonika (Noolean) 3.2k

1

Entering edit mode

Sorry, each sample is taken from a different part of the plant. A = leaves (from leaf morphology mutant plant), B = flowers (from mutant), C = flowers (wildtype), D = leaves (wildtype), . My aim is to test for genes in the leaf (mutant) sample which are upregulated over leaf (wt) and genes in the flower (mutant) sample upregulated over flower (wt). Ultimately, genes upregulated in the mutant (leaf and flower) over wildtype (leaf and flower) are the genes I'm interested in.

ADD REPLY • link 11.2 years ago by jomaco ▴ 200

2

Entering edit mode

From that, it sounds like you're best off loading all of the samples and just using a GLM, which DESeq can handle. BTW, I hope you have replicates from these.

ADD REPLY • link 11.2 years ago by Devon Ryan 104k

0

Entering edit mode

Question: Can I use contrast to compare say, just A to B once I load all of the samples into a R package?

ADD REPLY • link updated 23 months ago by Ram 44k • written 9.5 years ago by scchess ▴ 640

score 0 · Answer 1 · 2013-09-21

Realistically speaking, the library size change shouldn't be that drastic unless one of your samples is an outlier (e.g., you received very few reads from it, so it'll likely be useless anyway). The only benefit to including everything in the analysis from the get go is that you only load the count data once and will likely only have one script. You might as well try both ways and ensure that there's very little difference between them.