When performing differential gene expression (DGE) analysis on pairwise differences (contrasts), how can we identify the original group of deregulated genes? For instance, if we compare Case/Control (Tumor/Normal) and observe that the gene IGF2 is significantly upregulated, how can we determine whether this upregulation originates from the Tumor or the Control group? Typically, we assume that it comes from the Tumor due to the magnitude of expression difference (and direction of contrast design). However, if we reverse the contrast to Normal/Tumor, would this change the results and the interpretation of which group the gene is coming from? What happens when we determine multiple contrasts? Should we rely on average gene expression values then?
Thanks for the reply. What if we are comparing only among high-speed trains? IGF2 is typically activated in embryonic tissues and overexpressed in tumorigenesis. When defining the contrast as tumor - control, we associate IGF2 overexpression originates from the tumor group (also the way we design our contrast tumor- control. If we design control-tumor we will see downregulation in control as compared to tumor)
However, if we compare more specific conditions, such as tumor subtypes, and observe upregulation and downregulation simultaneously (not sure if that is possible), how can we determine which group (tumor subtype) this deregulation is coming from without prior knowledge? For instance, if we compare tumor_subtype1 - tumor_subtype2 and find a significantly high/lower logFC values for a gene, we can infer the magnitude of expression difference between the groups but how we can determine original group of that gene's (e.g IGF2) up/down regulation?
So, is our interpretation always relative to the direction of contrast (group1- group 2 or group2-group1) between these groups? How do we ensure that our conclusions about gene expression are accurate when dealing with such specific comparisons?
If you have sub tumor type information you run tumor_subtype1 vs. Control and tumor_subtype2 vs. Control in parallel. If you have bulk data and that your subtypes are mixed within your samples, you have no way to know.
So orientation of contrast does not really tell us about origin of deregulated genes?
What if we calculate average gene expression within each group (subtype) prior DGE analysis and then compare avg expression values of contrast with average gene expression values per group? And then analyze which which values are closer to which group in order to determine what is really happening?
If you know which samples are subtype1 and which are subtype2, run the DGE in parallel against the controls, you will see if subtype1 or subtype2 is up or downregulated compared to Ctrl. You can also run DGE between subtype1 and subtype2 to see which genes are up or downregulated in each subtype.