Question

Understanding the output of Negative Binomial in DESeq2

0

Entering edit mode

3.2 years ago

synat.keam ▴ 100

enter image description here Dear Seniors,

Hope you all are doing great. I am very new to RNASeq and DESeq2. I know that the negative Binomial (Gamma Poisson) is used to fit this RNAseq count data. All genes are assessed/fitted between two conditions and we get the basemean and log2fold change and then wald test is used to examine whether the coefficient is equal to zero if I am not wrong to determine whether the log2fold change is significant. I am so far familiar with linear rather than generalized linear model. I did a bit of poisson regression before.

I have the attached the output of the model after fitting through DESeq2. In my output, there were five genes are upregulated and 2 genes are downregulated. Are zero is a default option of log2fold change to be considered as up and down?

Does this mean that there are seven genes in total that will have significant adjusted p-value? i recall when multiple linear regression is fitted, we get an overall p-value so we can quickly know whether at least one coefficient is not equal to zero if p-value <0.05 or vice versa or I can just skim through the output to examine how many coefficients are significant. However, with the negative 2 binomial in DESeq2, I could not find the overall p-value at all

Also, the output could not list all the genes and adjusted p-value there because there are many of them have been fitted. Therefore, I am wondering how could I know which genes have significant log2fold change by looking through the output? Hope you do not mind me with my question as I am very new to RNAseq experiment and the analysis.

Additonally, I understand how to interpret the volcano plot. However, I am wondering whether all genes used for visialization in volcano plot? I have attached the plot, it seems not many dot points there so I am assuming only some genes are used to constructed volocano plot. Am I right? Do you think the plot looks alright. Sorry for asking and looking forward to hearing from author and seniors at your earliest convenience.

Kind Regards,

Synat enter image description here

DESe2 RNASeq • 2.1k views

ADD COMMENT • link 3.2 years ago by synat.keam ▴ 100

score 0 · Answer 1 · 2021-09-20

I have the attached the output of the model after fitting through DESeq2. In my output, there were five genes are upregulated and 2 genes are downregulated. Are zero is a default option of log2fold change to be considered as up and down?

Yes, the default Null hypothesis to test against is zero. If genes are below the FDR threshold (0.1 by default) and log2FoldChange > 0 then we call this "upregulated", and "downregulated" if < 0. You have 5 up- and 2 downregulated genes at the default FDR cutoff of 0.1. You are free to set this to 0.05 or any value you feel good with.

Does this mean that there are seven genes in total that will have significant adjusted p-value? i recall when multiple linear regression is fitted, we get an overall p-value so we can quickly know whether at least one coefficient is not equal to zero if p-value <0.05 or vice versa or I can just skim through the output to examine how many coefficients are significant. However, with the negative 2 binomial in DESeq2, I could not find the overall p-value at all

The results table contains the coef/name/contrast you specified. If none was specified then afaik the last element of resultsNames(dds) is used. The manual extensively describes how to set coefs and contrasts, please read it.

Additonally, I understand how to interpret the volcano plot. However, I am wondering whether all genes used for visialization in volcano plot? I have attached the plot, it seems not many dot points there so I am assuming only some genes are used to constructed volocano plot. Am I right? Do you think the plot looks alright.

All the plot shows is that you barely have significant genes. You should adjust the x-axis as it is overly wide.

score 0 · Answer 2 · 2021-09-20

Are zero is a default option of log2fold change to be considered as up and down?

Yes, by default, the null hypothesis is that the log2FoldChange is zero.

Does this mean that there are seven genes in total that will have significant adjusted p-value?

Yes, you are interpreting your results correctly that you have 7 genes in total where the log2FoldChange is significantly different from 0.

i recall when multiple linear regression is fitted, we get an overall p-value so we can quickly know whether at least one coefficient is not equal to zero if p-value <0.05 or vice versa or I can just skim through the output to examine how many coefficients are significant.

This sort of depends on how you have set the design in DESeq2. In a basic differential expression analysis, where you just have 2 groups that you wish to compare, DESeq2 will fit 36,955 independent negative binomial models each with only a single coefficient (actaully, two coefficients, including the interscept). The p-value that is reported for each gene is the p-value for the null hypothesis that the value of that single coefficient for that gene.

If your design includes more than one experimental factor, or you have an experimental factor that has more than two levels, you will end up with 36,955 negative binomial models with more than 1 coeffficient. If you are using the wald test, then there is no overall p-value. Instead, the p-value that is reported is determined by the value you give either for coef, contrast or name in your call to results. You can find valid values for the name parameter by running resultNames on your DESeq object.

If you specifically want an overall p-value (say you have three different treatments, and you want to know if treatments, overall, have any effect), then you need to use the Liklihood Ratio Test rather than the wald test. See http://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#likelihood-ratio-test

However, I am wondering whether all genes used for visialization in volcano plot?

I don't know EnhancedVolcano inside out, but I believe it does plot all genes. It looks like it has two few points because so many points are in the same place.