Hello everyone, I've been working on an RNAseq dataset, where I infected cells with virus and extracted RNA at different timepoints (3hpi, 6hpi, 9hpi, 24hpi) after infection to study DEG changes over time. I compare the 3hpi, 6hpi and 9hpi samples all to a 0hpi control sample, but although I set the p.value cutoff as 0.05 when I visualise the DEGs with Volcanoplot, it seems the cutoff is different for each of the contrasts. Is that normal?
Here is my R code (from after the lmFit() function) for the first thrre timepoints. For 24hpi the code is essentially the same, except that I compare 24hpi to a mock infected sample 24hpi, since this sample and the first three are from two separate experiments, but with the same virus and cells:
contr <- makeContrasts("3hpi" = group3-group0, "6hpi" = group6-group0, "9hpi" = group9-group0, levels = mm)
fit2 <- contrasts.fit(fit, contr)
fit3 <- limma::eBayes(fit2)
Results <- decideTests(fit3, method= "separate", adjust.method="BH", p.value=0.05, lfc = log2(1.5))
plotSA(fit3, main = "Final Model: Mean-variance Trend") #Final Model: Mean-variance Trend
summary(Results)
head(fit3)
#The Results object because it is not a data frame with named columns.
#Instead, you can use square brackets ([ ]) to subset the column.
table(Results[, '9hpi'])
unique(volcano_data$Significant)
#replace with whatever timepoint
comparison_name <- "9hpi"
# Convert levels in Results to characters
Results_char <- as.character(Results[, comparison_name])
# Create a data frame with log2 fold changes and -log10(p-values)
volcano_data <- data.frame(
logFC = fit3$coefficients[, comparison_name],
logPvalue = -log10(fit3$p.value[, comparison_name]),
Significant = as.character(Results_char) # Convert to character
)
# Set the desired p-value cutoff
p_value_cutoff <- 0.05
# Create a volcano plot using ggplot2
ggplot(volcano_data, aes(x = logFC, y = logPvalue, color = Significant)) +
geom_point(alpha = 0.5) +
scale_color_manual(values = c("-1" = "#377DC5", "0" = "grey", "1" = "#FF5733"), name = "Significant") +
theme_minimal() +
labs(
title = "0hpi vs 9hpi",
x = "log2 Fold Change",
y = "-log10(p-value)"
)+
# Add horizontal line for p-value cutoff
geom_hline(yintercept = -log10(p_value_cutoff), linetype = "dashed", color = "gray")
Also, here the resulting volcanoplots
I would be incredibly grateful if anyone knew whether those seemingly differing p-value cutoffs are normal when doing separate comparisons like this.
Cheers and thanks in advance for reading!
Dear Gordon, thank you for your quick response! I would like to take this opportunity and express my sincere gratitude to you. Limma is an amazing tool, and your willingness to constantly guide and help so many people with their bioinformatic analyses drives forward so many studies and contributes greatly to the scientific community. So, thank you!
Regarding your answer, I am glad to hear that the differing cutoffs are due to the method="separate" parameter. I understood from the limma manual that method="separate" is the most common to use, but given your expertise, Gordon, would you say that this method="separate" is also the appropriate parameter for my experimental setup? For each timepoint I basically had three separate cell culture flasks (representing biological replicates). The cells from the first experiment (0hpi, 3hpi, 6hpi, 9hpi) all originated from the same flask and were then split into twelve cell cultures and infected separately (or not infected in case of 0hpi), and 3,6,9hpi were all compared to 0hpi. Then, I have the other setup, where I infected cells for 12 and 24 hours, and had mock control cultures incubated for 12hpi and 24hpi, for respective comparison. I performed the RNAseq pipeline for both these experiments separately, so I would have assumed also method="separate" to be the feasible parameter here, but your insight on this would be highly appreciated.
The choice of separate vs global FDR control depends on how you intend to interpret the results. It does not depend on whether you prepared samples in separate flasks or not.
method="separate" and method="global" are both valid methods. Personally, for a time course like yours, I would be plotting the time course of genes over time instead of making a volcano plot for each time separately, and I might use an onverall F-test for the time trend instead of separate t-tests for each treatment time. However, I don't know the background to your experiment and there are many valid ways to do the analysis.
The second experiment is quite different and would be analyzed by compared infected to mock.
On a slightly different issue, I am not a fan of volcano plots of limma results because they lose important information about the expression level of each gene. Plotting fold-changes without knowing the baseline expression is potentially misleading. I always use mean-difference plots or time-course plots in my own publications.
Thank you, Gordon, for taking the time to read my comment and answering in such detail. I will think about how to go on with the analysis of the data, also with regard to your recommendations. Your insight has been very helpful, thanks again!