Hi,
I am currently working with the Fluidigm qRT-PCR data. There are 3 plates with total of 288 genes combined into one file (264 target genes + 8 * 3 (triplicate)= 24 Reference genes) with each plate consists of 96 genes (88 Target genes + 8 Reference genes) in one file. In summary, I have approximately 264 target genes and 8 * 3 = 24 reference genes in around 45 samples. Each of the samples are technically replicates.
I want to know if the methodology followed below is correct during the data analysis?
A. Handling multiple reference/housekeeping genes
Since, I have 8 reference genes in triplicate in each sample in a combined file, I created a data file with these reference genes across all samples,
- Average of on-chip reference genes at the each sample level (Arithmetic mean of 8 * 3 Reference genes leading to 8 * 1 Reference gene for each sample)
- Identify most stable reference genes for instance (top 4 ) using appropriate In-Silico approaches (geNorm, NormFinder, Bestkeeper etc) based on the M-value
- Create a Psuedogene by calculating the geometric mean of top 4 stable reference gene across samples
B. After this, 1. Create a new data file, average (arithmetic mean) technical replicates across all samples for remaining genes i.e with all the 264 target genes + psuedogene (geometric mean of top 4 reference gene)
For instance,
Detector Target Gene 1 Target Gene 2 . . . PsuedoGene
- Calculate △Ct (Difference between the Target gene and reference gene (i.e psuedogene))
- Calculate △△Ct (Difference between the sample and average of control samples)
- Calculate 2^(-△△Ct) to evaluate fold gene expression levels
Please let me know if the above analysis methodology looks fine?
It's unclear what exactly the "83 (triplicate)" is actually referring to, unless you have different but slightly overlapping reference gene sets by sample. Likewise, I'm not sure how taking the mean of of the reference genes changes their number (from 83 to 81). Aside from that the methodology seems fine (namely, creating a stable reference and then using the standard delta-delta Ct method).
Hi Devon,
Thank you. I have reposted the question once again. Earlier, there was a formatting issue with the mathematical signs. Hence, the numbers were not aligned properly.
It's much clearer now, what you're doing looks perfectly correct.
Thank you for the reply Devon.
Hi Devon,
Do you have any suggestions for the below questions: 1. Statistical test that could be used for instance, comparing significance between Before vs After groups for multiple time points. 2. What type of plots that could be generated for the comparison? 3. Which list (DeltaCt or Delta Delta Ct or 2^(-△△Ct)) could be used for the further downstream analysis like Gene Ontology, Gene Enrichment Analysis, and Pathway Analysis?
Thank you. I have another question, I spoke about selection of the top 4 reference genes based on the different method (Genorm, Normfinder etc) and creating a psuedogene of those 4 genes by geometric mean in A. Handling multiple reference/housekeeping genes
Does averaging all 8 reference genes directly and creating a psudogene by skipping A Step sound fine? In general, calculating Dct by difference 264 target genes and 1 reference gene. Here reference gene is (arithmatic avg of 8 reference gene)
DCt = Target Gene 1 - Psuedoegene (arithmatic avg of 8 reference gene)
Taking the geometric or other mean of those seems reasonable, in effect that's similar to what we do for normalization in RNA-seq.
Agree this seems fine. Perhaps a minor addition: computing △△Ct and 2^(-△△Ct) is of course fine if you are interested in fold-changes vs. a reference group, and it's popular with biologists, but if your experimental layout doesn't have an obvious single control group and/or you want to apply downstream analyses like ANOVA to log-scale normalized relative expression values, then just using -1*△Ct as your readout is quite valid too.
Hi Ahil,
Please provide more information on this
but if your experimental layout doesn't have an obvious single control group and/or you want to apply downstream analyses like ANOVA to log-scale normalized relative expression values, then just using -1*△Ct as your readout is quite valid too.
We have Before and After types of groups for multiple time points.
We often have multi-group qPCR experimental designs where we simply want a normalized log2-scale relative expression measure, where larger values indicate higher expression. -1*dCt is just that number. In that sense, -1*dCt is analogous to array-based expression measures like RMA or RNA-Seq measures like log2CPM, which our audience is often very used to seeing. For downstream analyses like visualization, clustering, or other supervised or unsupervised analyses, especially when there is not a single obvious reference group, -1*dCT can be a good fit. In that context, ddCt and 2^-ddCt, while of course perfectly valid and familiar to audiences thinking about fold-changes from a reference group, just represent additional derived calculations (log-differencing and linearization) that don't provide any practical benefits in terms of interpretability, suitability for stats analysis, or variance/precision of the readout. While I don't know details of your experimental design, if your interest is in multiple paired comparisons (before-after) then you could certainly execute paired linear modeling analyses on -1*dCt values.
Thank you. Indeed it is helpful.
Hi,
When I plot the hierarchical clustering heatmap using the significant genes (-DCt) between two conditions before vs after, I observe opposite profiles i.e Before (Red-max) and After (Green-min). However, when I run hierarchical clustering on an ANOVA list created from DCt, the profiles for the "After" conditions (Red) and "Before" (Green) condition. I am bit confused here as FC values for those genes are up-regulated in the statistical test. Could you please provide me some insights on this and what data should be used DCt or Negatice DCt for heatmap?
Thank you, Best Regards, Toufiq