I am trying to understand how volcano plots are made in DEG studies.
Say I have an example data frame that begins with the following first three rows:
head(myDF)
ID C_S1_R1 C_S1_R2 C_S1_R3 C_S2_R1 C_S2_R2 C_S2_R3
1 Glyma0002s50.1 4.9191791 5.2691071 4.8815068 5.172160 4.869919 5.435079
2 Glyma0003s50.1 6.4923718 7.3523922 7.1775645 5.984183 5.552393 5.920057
3 Glyma0005s50.1 0.8024444 1.8061819 2.4128941 3.508926 2.429191 2.024571
One pipeline a bioinformatician might follow is as follows:
d <- DGEList(counts = myDF[,2:7], group = c(rep("S1", 3), rep("S2", 3)), genes = myDF[,1])
d <- calcNormFactors(d)
d <- estimateCommonDisp(d)
d <- estimateTagwiseDisp(d)
d <- estimateTrendedDisp(d)
de <- exactTest(d, pair=c("S1", "S2"), dispersion = "tagwise")
deDF <- as.data.frame(de)
At this point, I believe the data frame is ready to be constructed into a volcano plot - because it contains a column for the logFC and a column for the p-values:
head(deDF)
table.logFC table.logCPM table.PValue comparison genes
1 0.06705792 4.730629 1.0000000 S1 Glyma0002s50.1
2 -0.23335090 4.977254 0.7492599 S2 Glyma0003s50.1
3 0.65611207 3.962589 0.5810553 S1 Glyma0005s50.1
I could then run a command like:
qplot(data=deDF, table.logFC, table.PValue)
to create the volcano plot, where each dot in the plot represents a gene and its logFC and p-value.
I have 3 main questions about this type of approach:
- Why does the comparison column in deDF structure switch between S1 and S2 in alternating rows? I would have thought the information in this column was the result of both S1 and S2 comparisons for each row anyway.
- How is the table.logFC column calculated? I tried to replicate the table.logFC values in deDF object from the read counts of the same rows of the myDF object, but cannot figure it out. Any advise would be appreciated!
- How is the table.PValue column calculated? I imagine it would be a difference between the means of the S1 and S2 groups?
If anyone has any ideas toward any of these 3 questions, I would be happy/grateful to hear them!
And have you read the edgeR paper?
Did you use the raw data or a normalised version of your data?
I used a normalized version of my data. Thank you.
what is the need for padjacent value in volcano plot
Please use
ADD COMMENT
orADD REPLY
to answer to previous reactions, as such this thread remains logically structured and easy to follow. I have now moved your reaction but as you can see it's not optimal. Adding an answer should only be used for providing a solution to the question asked.This question is off topic and would be more appropriate in a new thread, but please spend some more time thinking about it and write a slightly longer question.