Question

Experimental design RNA-seq

1

Entering edit mode

5.7 years ago

Andrea.Wall ▴ 10

Hi everyone,

I have a question regarding my experimental design and a possible confounding factor. I am working on a RNA-seq project (honeybee brain mRNA) in which I will have 4 conditions with 10 biological replicates each. I just realized based on other analyses on the same samples that at times of sampling the animals were transitioning across two behavioural states that are known to be associated with substantial changes in brain gene expression (ca 40% of genes are known to be differentially expressed between the two).

So I have about 60% of my samples in one state and the others in the other state. These behavioural groups are represented in each treatment but not in a completely balanced way as I only discovered this additional factor later. I am worried that a factor with such a huge effect on gene expression may hamper the recovery of DEGs from the treatments that I am interested in (which likely will have more mild effects than behavioural state) and I am also wondering if it would be better to have the sampling fully balanced, so 20 samples in one state, 20 in the other, 5 samples per treatment in each state. Or could I still account for behaviour as a covariate in my differential expression analyses equally well even with no perfect balancing of the factor?

Since I have not yet prepared the libraries, would you suggest I go back and change some of the samples so that the behaviours are 50/50 and equally represented across treatments?

Thanks! Andrea

RNA-Seq • 1.4k views

ADD COMMENT • link 5.7 years ago by Andrea.Wall ▴ 10

0

Entering edit mode

see if this helps: https://bioconductor.org/packages/release/bioc/html/RNASeqPower.html Andrea.Wall

ADD REPLY • link 5.7 years ago by cpad0112 21k

score 2 · Answer 1 · 2019-03-17

2

Entering edit mode

5.7 years ago

Friederike 9.0k

If you still have the option to optimize your experimental set-up, I would definitely encourage you to do that.

There are ways to account for the batch effects, even in unbalanced samples, but it'd be a pity if you'd have to waste your 10 replicates on accounting for effects that you may not be interested in -- assuming you don't care for the changes introduced by the difference in behavioral state. If you're mostly interested in the effect of your treatment, it'd be best to figure out which of the two behavioral states is the one you're actually interested in and focus on that one, especially since you already mentioned that the treatment effects are probably going to be more subtle than the changes induced by the behavioral switch.

Best of luck!

ADD COMMENT • link 5.7 years ago by Friederike 9.0k

0

Entering edit mode

Dear Friederike,

Thanks for your helpful input on this. I am still very much in doubt about what is the best way of moving forward. I think that even though the direct effect of the behavioural state is not what I am interested in, I would be interested in knowing the extent to which the treatment effects are general across the two states, or whether there would be an interaction between treatment and behavioural group. So I guess in that case the best way to go would be to have the samples balanced across behaviours and treatments, right? On the other hand it would be great if one would be able to calculate how much I would lose in terms of statistical power to detect DEGs based on the treatments if I include samples from both behavioural groups. I would then have a factor with two levels for behaviour with n=20 each and a factor with 4 levels for treatment with n=10 each, divided into 5 and 5 for each behaviour. Is there any way to estimate how strong the treatment effect would have to be if the behaviour would affect about 40% of the genes, to detect a reasonable amount of the genes affected by the treatment?

Thanks again.

Best, Andrea

ADD REPLY • link 5.7 years ago by Andrea.Wall ▴ 10