Entering edit mode
4.2 years ago
DeadlyFall
•
0
Hello all,
I'm currently attempting to analyze some RNA-Seq data using DESeq2 and having a hard time since this experiment does not have biological replicates, yet it has 3 different variables. I question whether I should even be using DESeq2 in this situation and what a better alternative would be.
Here is my sample table:
sample disease condition cell_line
1 yes cond1 76
2 yes cond2 76
3 yes cond3 76
4 yes cond1 81
5 yes cond2 81
6 yes cond3 81
7 no cond1 K
8 no cond2 K
9 no cond3 K
10 no cond1 N
11 no cond2 N
12 no cond3 N
13 no cond1 S
14 no cond2 S
15 no cond3 S
Currently I've treated the cell lines as biological replicates, but I'm sure my results are now confounded by that. What is the best way to handle an experiment like this in DESeq2 or in general?
You are correct in thinking that the factors here are confounding. There is no way you can say if the difference between cond1 and cond2 is ac actual difference between them or a difference between 76/81/K/N/S in a statistically significant fashion.
Thank you for confirming. It is common that I receive such experiments unfortunately and it is not fun to tell the researcher that their data won't allow them to answer their question : / It's nice to have others state the same so they know it is not just my opinion.
A possible way would be to do an exploratory data analysis. Do a PCA by using certain top N genes and plot the samples labelled by cell line. If you see cell lines appearing together then there is a batch effect definitely. However, the reverse might not be true.
Thank you for the input! I have performed this using the top 25 genes from one of the comparisons and luckily the cell lines did not make too much of a difference . Most of the variation was explained by PC1 and there was a nice split between disease within a condition, with cell lines scattering a bit along PC2.