Question

How is logFC and p-value estimated (for volcano plots)?

1

Entering edit mode

7.6 years ago

LRStar ▴ 200

I am trying to understand how volcano plots are made in DEG studies.

Say I have an example data frame that begins with the following first three rows:

head(myDF)
              ID   C_S1_R1   C_S1_R2   C_S1_R3  C_S2_R1  C_S2_R2  C_S2_R3 
1 Glyma0002s50.1 4.9191791 5.2691071 4.8815068 5.172160 4.869919 5.435079
2 Glyma0003s50.1 6.4923718 7.3523922 7.1775645 5.984183 5.552393 5.920057
3 Glyma0005s50.1 0.8024444 1.8061819 2.4128941 3.508926 2.429191 2.024571

One pipeline a bioinformatician might follow is as follows:

d <- DGEList(counts = myDF[,2:7], group = c(rep("S1", 3), rep("S2", 3)), genes = myDF[,1])
d <- calcNormFactors(d)
d <- estimateCommonDisp(d)
d <- estimateTagwiseDisp(d)
d <- estimateTrendedDisp(d)
de <- exactTest(d, pair=c("S1", "S2"), dispersion = "tagwise")
deDF <- as.data.frame(de)

At this point, I believe the data frame is ready to be constructed into a volcano plot - because it contains a column for the logFC and a column for the p-values:

head(deDF)
  table.logFC table.logCPM table.PValue comparison          genes
1  0.06705792     4.730629    1.0000000         S1 Glyma0002s50.1
2 -0.23335090     4.977254    0.7492599         S2 Glyma0003s50.1
3  0.65611207     3.962589    0.5810553         S1 Glyma0005s50.1

I could then run a command like:

qplot(data=deDF, table.logFC, table.PValue)

to create the volcano plot, where each dot in the plot represents a gene and its logFC and p-value.

I have 3 main questions about this type of approach:

Why does the comparison column in deDF structure switch between S1 and S2 in alternating rows? I would have thought the information in this column was the result of both S1 and S2 comparisons for each row anyway.
How is the table.logFC column calculated? I tried to replicate the table.logFC values in deDF object from the read counts of the same rows of the myDF object, but cannot figure it out. Any advise would be appreciated!
How is the table.PValue column calculated? I imagine it would be a difference between the means of the S1 and S2 groups?

If anyone has any ideas toward any of these 3 questions, I would be happy/grateful to hear them!

R edgeR fold-Change • 6.0k views

ADD COMMENT • link updated 14 months ago by Ram 44k • written 7.6 years ago by LRStar ▴ 200

1

Entering edit mode

And have you read the edgeR paper?

ADD REPLY • link 7.6 years ago by WouterDeCoster 47k

0

Entering edit mode

Did you use the raw data or a normalised version of your data?

ADD REPLY • link 7.3 years ago by Spacebio ▴ 200

0

Entering edit mode

I used a normalized version of my data. Thank you.

ADD REPLY • link 7.3 years ago by LRStar ▴ 200

0

Entering edit mode

what is the need for padjacent value in volcano plot

ADD REPLY • link 6.7 years ago by jha.barkha8 ▴ 10

0

Entering edit mode

Please use ADD COMMENT or ADD REPLY to answer to previous reactions, as such this thread remains logically structured and easy to follow. I have now moved your reaction but as you can see it's not optimal. Adding an answer should only be used for providing a solution to the question asked.

This question is off topic and would be more appropriate in a new thread, but please spend some more time thinking about it and write a slightly longer question.

ADD REPLY • link 6.7 years ago by WouterDeCoster 47k

score 4 · Answer 1 · 2017-04-04

As WDC implied, the answers to your three questions is in the paper and documentation of edgeR. You should read these thoroughly. However your interpretation of the p-value is completely wrong here :

I imagine it would be a difference between the means of the S1 and S2 groups?

If the fold-change reflects the variability (the ratio, actually) between the means of the groups, the p-value reflects the significance of that variability in regards to the variability within the groups. You should at least understand that before you even start thinking about building a volcano plot.