Dear Biostars, Hi
In order to bioinformaticly show DEGs of RNA-seq in a volcano plot, I intend to use R, showing DEGs with blue spots and not significantly expressed genes with red dots.
The edgeR package which is embedded in Trinity software, uses FDR (>0.001) and log2FC (>2) of the matrix file as the threshold.
I will show my R code and the head of my matrix file. My QUESTIONs are:
1- Which I should use, FDR or -log10(FDR) ? (e.g in aes(x=logFC, y =-log10(FDR),
) (when I used FDR the volcano was funny and when I used -10log(FDR), it seems that the number of blue dots that are presenting the DEGs are more than number of DEGs reported by Trinity package.
2- which I should use; lof FC or log2FC?
NOTE1: The Code
library(ggplot2)
data <- read.table("trans_counts.matrix.A_vs_B.edgeR.DE_results", header=TRUE)
##Identify the genes that have a FDR < 0.001
data$threshold = as.factor(data$FDR < 0.001)
##Construct the plot object
g <- ggplot(data=data,
aes(x=logFC, y =-log10(FDR),
colour=threshold)) +
geom_point(alpha=0.4, size=1.75) +
xlim(c(-14, 14)) +
xlab("log2 fold change") + ylab("-log10 FDR") +
theme_bw() +
theme(legend.position="none")
g
NOTE2: Head of file
> external_gene_name logFC logCPM P.Value FDR
>
> TRINITY_DN179827_c0_g2_i4 -13.8732718193368 4.61855606743036 3.03811154383156e-30 1.87897780840196e-24
>
> TRINITY_DN194528_c5_g3_i4 13.2697070220115 4.01618042373378 1.21936893870205e-27 3.7707094407506e-22
>
> TRINITY_DN192883_c9_g2_i2 11.2269211011449 3.82178000481344 3.20153046019011e-26 6.60015780727773e-21
I think your logFC values are is already log2. In general, volcano plots are logFC vs P-value (-logged (10), in most cases). There are manuscripts that used FDR in volcano plots. Almost similar post here: Volcano plot p-value or FDR?
Thanks, In "aes(x=logFC, y =-log10(FDR)" I should use FDR or -log10(FDR) ?
It makes sense (to me) to use -log10 (FDR) instead of FDR. Talk to your PI or client or the intended journal, once you are done with it and iterate it as per the feedback. Please label the axis accordingly.
For question 1, I think the problem is you set the threshold on the original FDR values (
data$FDR < 0.001
), but then apply it to transformed values (y =-log10(FDR)
)Yes,
In original research I used FDR = 0.001 and log2FC =2 and I want to draw same volcano.
but
when I set the threshold as "
data$threshold = as.factor(data$-log10(FDR) < 0.001)
" it says :"Error in eval(expr, envir, enclos) : object 'threshold' not found"when I use "
data$threshold = as.factor(data$P.Value < 0.001)
" and "aes(x=logFC, y =-log10(FDR),
", the volcano is appeared.you can also see that in Trinity, volcano is according to -log10FDR.
What do you think? what should I do ?
Try in these lines.
Hi, I tried it,
The blue spots change to red and red dot changed to blue!
my bad. It should be >, not <.
they must be same. Looking back, this should not make any difference. Your original code should still be good as threshold (T/F) is on a list of genes.
Hi @cpad0112, I have another question and SORRY if it is not a bioinformatic one,
Is there any way to change these red and blue colours of dots (e.g into red and green)?
as I could not guess which part of my R script is resbomsible for that by looking at it's line of command.
Thanks
Try adding following line ( as a separate line)
hi cpad0112,
I would also like to draw a volcano map showing my diff expressed transcripts. But first, I want to filter my data from edgeRglm. After running edgreglm, i got logfc, logcpm, pval, & adjfdr val. I would like to log2 the logfc and logcpm values. How to do that? Or, is my logfc and logcpm already in log2 state?
I am not sure about your data output. But in general, most of them output log transformed vales for fold change calculation. Please note that volcano plot is between lfc (fold changes) and p/adj p values.