I'm learning to analyze GEO data and using GEO2R is the most straightforward to me, but I'm having problem understanding how GEO2R (or limma) calculate the logFC.
"First define the treated group (it will be colored in blue), then define the control group (it will be colored in pink). The order is important for calculating log fold changes later in the analysis"
It seems to me that logFC is calculated in GEO2R as logFC = log(control/treated) (or log(normal/tumor) in this case)
However, I've read from several sources that logFC = log(treated/ control)
This is confusing me in interpreting gene expression level. Could you please help explain this to me.
The logFC (to be proper, it is log [base 2] fold change) calculation can have anything as the numerator and anything as the denominator. It is your role as the analyst to define these.
Thank you for your comment. I agree that what we choose as numerator and denominator are up to our decision. I'm just confused as GEO2R always take Pink/ Blue (control/ treatment) while some people say it's always treatment/ control. But it's useful to know there's no fixed rule on log2FC calculation.
I have two groups: Pancreatic cancer and control, I defined it in GEO2R via Defined Groups menu. 1. (blue): Pancreatic cancer 2. (pink): Control
According to your message the result should be Pancreatic Cancer / Control. Is it true.? I see below code in GEO2R codes. What is the meaning of it and are G1-G0 related to control or treated groups? G1 / G0 = Pancreatic Cancer / Control ?
If you have skills in R, I would encourage you to avoid using GEO2R, and to instead use your own coding.
I do not know to what G1 and G0 relate, but could be cell cycle stage. I do know, however, that the code that is generated by GEO2R can be incorrect / misleading.
If you want further help, please at least post the GEO accession ID of this stage.
Thank you for your reply.
I think, it is not related to cell cycle stage. Because when I review another IDs, I see the other numbers such as G3-G4...
GEO ID is GSE24279
Thank you
I also emailed GEO2R team and they've just replied. For anyone who has the same question:
"The first group that is named in the 'Define groups' drop-down menu becomes the denominator in logFC, and the second group that is named becomes the numerator. The background pink/blue colors are not important.
Reversing the order in which you name groups will result in a reversed sign in logFC (e.g., 4 vs. -4)."
Hi Kevin
Thank you for your comment. I agree that what we choose as numerator and denominator are up to our decision. I'm just confused as GEO2R always take Pink/ Blue (control/ treatment) while some people say it's always treatment/ control. But it's useful to know there's no fixed rule on log2FC calculation.
I believe you can change the order via GEO2R, but I do not use it too much.
Hello,
I have two groups: Pancreatic cancer and control, I defined it in GEO2R via Defined Groups menu. 1. (blue): Pancreatic cancer 2. (pink): Control
According to your message the result should be Pancreatic Cancer / Control. Is it true.? I see below code in GEO2R codes. What is the meaning of it and are G1-G0 related to control or treated groups? G1 / G0 = Pancreatic Cancer / Control ?
cont.matrix <- makeContrasts(G1-G0, levels=design)
gsms <- paste0("XXXXXXXXXXXXXXXXXXXXXX1111111111111111111111111110", "00000000000000000000000000000000000000000000000000", "00000000000000000000000000000000000000000000000000", "00000000000000000000000000000000000")
Thanks
If you have skills in R, I would encourage you to avoid using GEO2R, and to instead use your own coding.
I do not know to what G1 and G0 relate, but could be cell cycle stage. I do know, however, that the code that is generated by GEO2R can be incorrect / misleading.
If you want further help, please at least post the GEO accession ID of this stage.
Thank you!
Hi Kevin,
Thank you for your reply. I think, it is not related to cell cycle stage. Because when I review another IDs, I see the other numbers such as G3-G4... GEO ID is GSE24279 Thank you
I do not see anything related to G0-4 when I go to GSE24279. It is just a case-control study for pancreatic cancer.
I would just obtain the data in R, like this:
After that, I would conduct my analyses in R using limma
Thank you Kevin,
I will try to write them Best regards