Hi all,
I have a question related to this previous post, Technical/Biological Replicates In Rna-Seq For Two Cell Lines, but different in a few ways.
I'd greatly appreciate your help.
I have different cell lines derived from human fibroblasts, which I have grown in vitro and prepared RNA-seq libraries from. I want to find genes differentially expressed between two conditions using DESeq 2.
For group 1 (WT/non-disease), I have 4 lines (4 individuals): 1A, 1B,1C,1D.
For group 2 (disease), I have just 2 lines (2 individuals): 2A, 2B.
Furthermore, from WT/non-disease lines 1A and 1B, I have 3 different colonies from each, grown in separate for 30 days (1A-1, 1A-2, 1A-3 and 1B-1, 1B-2, 1B-3).
From WT/non-disease lines 1C and 1D, I only grew one colony from each for 30 days (1C-1, 1D-1).
For the disease lines 2A and 2B, I have 2 different colonies from one, grown separately for 30 days (2A-1. 2A-2), while for the other, I have 3 different colonies (2B-1, 2B-2, 2B-3).
For each colony, I did only one RNA extraction and library prep + sequencing, so I have no strictly technical replicates, for a total of 13 libraries, each from a distinct colony grown separately for 30 days, albeit some from the same human cell line.
My gut feeling is that each library should be a biological replicate since they were all derived from separate colonies and are not really technical replicates, but I am also aware that there are two levels of biological variation in my experiment -- some colonies are from different humans, and others are colonies from the same human grown in parallel for 30 days.
Should I collapse the colonies from the same human individuals into a single column? Or is it better to keep each colony as a separate biological replicate given that it is capturing more variation for the condition than expected by just being a technical replicate?
Thank you so much for your response, Dario. Your logic seems really adequate to me.
Some colleagues are suggesting to me that by keeping the different colonies separate, I am overestimating the statistical power I have and that I should collapse them per individual... but I really feel that this is relevant variation that should be taken into account.
I think I will go with keeping them separate. What I am unsure of is if having unequal number of samples per individual (cell line) AND per condition is detrimental to the analysis. For two WT lines, I have 3 colonies each, and for the other two WT lines, I just have one colony. On the other hand, for the Disease line, I have 1 line with 3 colonies and 1 line with just 2 colonies.
Do you think I should eliminate some samples in order to even out the matrix across conditions/lines?
Thanks again.
Fantastic description dariober :)