I want to do a differential expression analysis on RNA seq data of leukemic patients for my first time. I have two questions:
There is a dataset in GEO containing 2 leukemic patients samples and 2 normal cell line data. is this sample size enough for a meaningful result? Is this comparing true?
Can I compare data from different datasets of GEO with each other?
There is a dataset in GEO containing 2 leukemic patients samples and 2 normal cell line data. is this sample size enough for a meaningful result ? Is this comparing true ?
No. 2 vs 2 is way underpowered for human data. Also, there is very few cell lines that are actually "normal" in terms of non malignantly-transformed. For example, by overexpression of a transcription factor that immortalized them, such as Hoxb8. And even with these you would get differential genes that are mostly reflecting the culturing conditions and the fact that they live in a petri dish, and the overexpression signature of the transcription factor, rather than any meaningful leukemia vs normal signature. It's not as simple as that unfortunately. After all, the cell-of-origin debate is as old as leukemia research itself, and a cell line does not suffice. This analysis is almost certainly going to not be meaningful.
Can i compare data from different datasets of GEO with each other ?
Thank you for your answers.
you can use samples from the TCGA... www.nejm.org/doi/full/10.1056/nejmoa1301689