I had a question with regards to a GEO dataset: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE92776. The link says that this data contains expression profiling from RT PCR and so if I downloaded the raw non-normalized data at the bottom of the page, would the values there be gene counts/Ct values or dCt values? Could anyone advise on how I could change the numbers to the right format to be used by Limma/DESeq or how else I can perform differential gene analysis on them to get log2FoldChange values?
Batch corrected delta Ct values are reported. Ct values were reported by the AppliedBiosystems SDS software. Delta Ct values were computed using the median of endogenous control transcripts. Finally, Delta Ct values were batch corrected using a linear model.
So delta Ct values. You would not use these values in limma or DESeq2, you would process them as you would any other qPCR dataset, which is to subtract the delta Ct values (the result, or delta delta Ct, is the log2 fold-change).
Hi Ryan,
Thank you so much for your reply, I was completely confused about this. If the values reported are delta Ct values, what exactly would computed via "median of endogenous control transcripts" mean? I guess I am unable to understand, for instance, what the delta Ct value for say, Gene A, from a patient in Group A with the disease flare would mean. (is it the Ct value being subtracted from that of a reference gene of some sort?)
Also, if I wanted to compare the logFC between Group A patients with flare (1 sample per patient) with the control group (2 samples per patient), how could I go about doing this? The data looks something like this:
The number of patients in Group C and A are different. Would you suggest for this gene, that I average out the delta Ct values for Group A Flare patients, and do the same for Group C and then take the difference to get the logFC?
I apologize if my explanation is a little messy, I have no clue as to how to look for differentially expressed genes in this format. Thank you.
The "median of endogenous control transcripts" is used for the other Ct value, which is used to compute delta Ct. Since you're unfamiliar with qPCR, I suggest you read a blog post or two regarding how it works. That should clarify things easily enough.
If delta Ct values are approximately normal distributed, I wonder why it would not be appropriate to use limma or standard t-test.
Normally you could, I'm just not sure how the batch corrected values are distributed.
Hi Ryan, Thank you so much for your reply, I was completely confused about this. If the values reported are delta Ct values, what exactly would computed via "median of endogenous control transcripts" mean? I guess I am unable to understand, for instance, what the delta Ct value for say, Gene A, from a patient in Group A with the disease flare would mean. (is it the Ct value being subtracted from that of a reference gene of some sort?)
Also, if I wanted to compare the logFC between Group A patients with flare (1 sample per patient) with the control group (2 samples per patient), how could I go about doing this? The data looks something like this:
The number of patients in Group C and A are different. Would you suggest for this gene, that I average out the delta Ct values for Group A Flare patients, and do the same for Group C and then take the difference to get the logFC?
I apologize if my explanation is a little messy, I have no clue as to how to look for differentially expressed genes in this format. Thank you.
The "median of endogenous control transcripts" is used for the other Ct value, which is used to compute delta Ct. Since you're unfamiliar with qPCR, I suggest you read a blog post or two regarding how it works. That should clarify things easily enough.
Thank you Ryan, I shall read up on this.