Hi everybody,
This my first post here, so first I would like to thank active people on this forum because it was very helpful to me :)
Here we can find reads counts for 935 cancer cell lines : https://ocg.cancer.gov/ctd2-data-project/translational-genomics-research-institute-quantified-cancer-cell-line-encyclopedia There is just one reads counting for each cell line.
I took some of these cell lines and grouped them between "resistant" and "sensitive". Now I would like to run edgeR or DESeq2 for differential gene expression.
For example let's say I have 6 cell lines : A, B, C, D, E, and F, with their reads count. In the group "resistant", I have A, B and C. In the group "sensitive", I have C, D and E. I would to use DESeq2 or edgeR in order assess differentially expressed genes.
So in this configuration, their is no technical replicates, but the 3 different cell lines of each group are considered as replicates. Is that correst to do this ?
Thank you very much.
In your design, your replicates are the group members. So 3 cell lines in "resistant" group and 3 cell lines in "sensitive".
If these cell lines should be treated like replicates is a question that you, the scientist, need to answer. Does it make sense to treat these chosen cell lines as members of the same group? You can also see if a MDS plot in edgeR groups these "replicates" together.
PS. You don't need technical replicates for analysis in edgeR or DEseq2.
Also having the classification resistant and sensitive may not give you what you want. The cell lines you are grouping must overlap at some specific features and only the genes corresponding to the overlapping features should be taken seriously after the analysis. Even in this case, it does not feel very comfortable. Its like grouping red apple, strawberry, cherry vs banana, lemon, corn based on their colors.
@firatuyulur exactly! But I thought that dispersion estimation is supposed to take that into account...
I don't remember where exactly in the documentation it says this for DESeq, but you should only calculate the dispersion estimate for conditions with multiple biological replicates. You can still compare a group with one biological replicate with another group using that estimate, however the comparison will likely not be very accurate.