Hi all,
We ended up having RNAseq experiment with confounding batch effect.
time batch
1 0 1
2 0 1
3 1.5 1
4 1.5 1
5 1.5 1
6 3 1
7 3 1
8 3 1
9 3D 2
10 3D 2
11 3D 2
12 12 1
13 12 1
14 12 1
15 24 1
16 24 1
17 24 1
18 24 1
19 24c 1
20 24c 1
I need to compare 3vs0 and 3Dvs0, but 3 and 3D are two different treatments at the same time point, and they are in different batches, so its impossible to separate biological effect from the possible batch effect.
However, I can define the negative control genes - genes which are not affected by two types of treatments, but probably are affected by the batch effect. So my idea is to use RUVseq
package (RUVg
module), which in theory can infer the batch effect based on these negative controls, and account for it in down-stream DESeq2
analysis.
I'd appreciate if anyone can comment whether this approach is reasonable and I if anyone had a similar situation before. Thank you in advance
Disclaimer: posted in Bioconductor with no response.
I don't think there is any tool or method that can do this for you. It is a flaw in your design, there is no magic trick to repair this flaw. Start over and think about the design first.
well, no magic here. The package assumes that you know in advance genes that do not change the expression due to treatments, and if you see the change, then its caused by the batch effect. Here is the manual http://bioconductor.org/packages/release/bioc/vignettes/RUVSeq/inst/doc/RUVSeq.pdf
You mentioned "two types of treatments". If there is a third factor "treatment" in addition to time and batch that is not shown in the design matrix above can you add it? Colleagues have have had some good experience with
RUVg
in general but I'm not clear that your design matches their experience. Other questions include how how confident you are that you can identify negative control genes, how confident you are that batch effects are not gene-dependent. What's the source of the batch effect?Hi Ahill, two treatments are 3 and 3d, i.e. human cells at 3 hours incubated with viable fungal cells (3) and human cells incubated with dead fungal cells (3d). So basically time effect cancels out. I did not receive any feedback from the developer and Bioconductor community, but anyways used the package and it actually gave biologically meaningful results, so I tend to trust them. Regarding the control genes, so I chose the genes which are not differential expressed during the whole time course of infection, on top of that we have the same design with 3 other fungal species, so I took the common non-DE genes across the infection with 4 species (we see that the human response is quite similar to all of them). Of course I can not be 100% sure that these control genes (around 200 highly expressed genes) are completely intact, but I guess there is a pretty high chance for that - if these genes do not change the expression levels when exposed to 4 different fungal pathogens, I assume they still will be intact when exposed to dead fungal cells of that same species. Does it sound reasonable? Regarding if the batch effect is gene-dependent, so my assumption (which might be wrong) that the batch effect is systematic across all genes, but to be honest I have no idea. In our case there could be many sources of the effect - experiments were done 1 year apart from each other (we basically are testing a hypothesis based on the 1st batch data), though by the same person, but library preps and sequencing was done by different people.