Question

Showing FDR and log2FC in Volcano plot using R

2

Entering edit mode

7.4 years ago

Farbod ★ 3.4k

Dear Biostars, Hi

In order to bioinformaticly show DEGs of RNA-seq in a volcano plot, I intend to use R, showing DEGs with blue spots and not significantly expressed genes with red dots.

The edgeR package which is embedded in Trinity software, uses FDR (>0.001) and log2FC (>2) of the matrix file as the threshold.

I will show my R code and the head of my matrix file. My QUESTIONs are:

1- Which I should use, FDR or -log10(FDR) ? (e.g in aes(x=logFC, y =-log10(FDR),) (when I used FDR the volcano was funny and when I used -10log(FDR), it seems that the number of blue dots that are presenting the DEGs are more than number of DEGs reported by Trinity package.

2- which I should use; lof FC or log2FC?

NOTE1: The Code

library(ggplot2)
data <- read.table("trans_counts.matrix.A_vs_B.edgeR.DE_results", header=TRUE)

##Identify the genes that have a FDR < 0.001
data$threshold = as.factor(data$FDR < 0.001)

##Construct the plot object
g <- ggplot(data=data, 
            aes(x=logFC, y =-log10(FDR), 
                colour=threshold)) +
  geom_point(alpha=0.4, size=1.75) +
  xlim(c(-14, 14)) +
  xlab("log2 fold change") + ylab("-log10 FDR") +
  theme_bw() +
  theme(legend.position="none")
g

NOTE2: Head of file

> external_gene_name    logFC   logCPM  P.Value FDR
> 
> TRINITY_DN179827_c0_g2_i4 -13.8732718193368   4.61855606743036    3.03811154383156e-30    1.87897780840196e-24
> 
> TRINITY_DN194528_c5_g3_i4 13.2697070220115    4.01618042373378    1.21936893870205e-27    3.7707094407506e-22
> 
> TRINITY_DN192883_c9_g2_i2 11.2269211011449    3.82178000481344    3.20153046019011e-26    6.60015780727773e-21

R RNA-Seq volcano plot • 15k views

ADD COMMENT • link updated 5.9 years ago by zx8754 12k • written 7.4 years ago by Farbod ★ 3.4k

0

Entering edit mode

I think your logFC values are is already log2. In general, volcano plots are logFC vs P-value (-logged (10), in most cases). There are manuscripts that used FDR in volcano plots. Almost similar post here: Volcano plot p-value or FDR?

ADD REPLY • link 7.4 years ago by cpad0112 21k

0

Entering edit mode

Thanks, In "aes(x=logFC, y =-log10(FDR)" I should use FDR or -log10(FDR) ?

ADD REPLY • link 7.4 years ago by Farbod ★ 3.4k

1

Entering edit mode

It makes sense (to me) to use -log10 (FDR) instead of FDR. Talk to your PI or client or the intended journal, once you are done with it and iterate it as per the feedback. Please label the axis accordingly.

ADD REPLY • link 7.4 years ago by cpad0112 21k

0

Entering edit mode

For question 1, I think the problem is you set the threshold on the original FDR values (data$FDR < 0.001), but then apply it to transformed values (y =-log10(FDR))

ADD REPLY • link 7.4 years ago by h.mon 35k

0

Entering edit mode

Yes,

In original research I used FDR = 0.001 and log2FC =2 and I want to draw same volcano.

but

when I set the threshold as "data$threshold = as.factor(data$-log10(FDR) < 0.001) " it says :"Error in eval(expr, envir, enclos) : object 'threshold' not found"

when I use "data$threshold = as.factor(data$P.Value < 0.001)" and " aes(x=logFC, y =-log10(FDR), ", the volcano is appeared.

you can also see that in Trinity, volcano is according to -log10FDR.

What do you think? what should I do ?

ADD REPLY • link 7.4 years ago by Farbod ★ 3.4k

0

Entering edit mode

data$threshold = as.factor(-log10(data$FDR) < 3)

Try in these lines.

ADD REPLY • link 7.4 years ago by cpad0112 21k

0

Entering edit mode

Hi, I tried it,

The blue spots change to red and red dot changed to blue!

ADD REPLY • link 7.4 years ago by Farbod ★ 3.4k

1

Entering edit mode

my bad. It should be >, not <.

data$threshold = as.factor(data$FDR < 0.001)
data$threshold = as.factor(-log10(data$FDR) > 3)

they must be same. Looking back, this should not make any difference. Your original code should still be good as threshold (T/F) is on a list of genes.

ADD REPLY • link 7.4 years ago by cpad0112 21k

0

Entering edit mode

Hi @cpad0112, I have another question and SORRY if it is not a bioinformatic one,

Is there any way to change these red and blue colours of dots (e.g into red and green)?

as I could not guess which part of my R script is resbomsible for that by looking at it's line of command.

Thanks

ADD REPLY • link 7.3 years ago by Farbod ★ 3.4k

1

Entering edit mode

Try adding following line ( as a separate line)

scale_color_manual(values = c("green", "red")) +

ADD REPLY • link 7.3 years ago by cpad0112 21k

0

Entering edit mode

hi cpad0112,

I would also like to draw a volcano map showing my diff expressed transcripts. But first, I want to filter my data from edgeRglm. After running edgreglm, i got logfc, logcpm, pval, & adjfdr val. I would like to log2 the logfc and logcpm values. How to do that? Or, is my logfc and logcpm already in log2 state?

ADD REPLY • link 6.8 years ago by bioinfool ▴ 20

0

Entering edit mode

I am not sure about your data output. But in general, most of them output log transformed vales for fold change calculation. Please note that volcano plot is between lfc (fold changes) and p/adj p values.

ADD REPLY • link 6.8 years ago by cpad0112 21k