Hi,
I am trying to see which genes are equally (or conversely, differentially) expressed across two tissue types: Artery and Blood. I need to exclude those which are similarly expressed from my study. I wish to use the Wald test data to help me.
These are the results of the DeSEQ analysis:
Question: For each gene how do I interpret the "stat" and/or other values to accomplish this? I understand everything bar the "stat" value. Is there a specific "stat" value range, or direction I should use as a cut-off? I'm really struggling to make sense of this.
log2 fold change (MLE): Tissue_Type Blood vs Artery
Wald test p-value: Tissue Type Blood vs Artery
DataFrame with 10147 rows and 6 columns
baseMean log2FoldChange lfcSE stat pvalue padj
<numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
ENSG00000000003.14 33.81022 -4.099087 0.0547015 -74.93560 0.00000e+00 0.00000e+00
ENSG00000000419.12 115.26431 -2.223858 0.0324612 -68.50819 0.00000e+00 0.00000e+00
ENSG00000000457.13 18.71515 1.179049 0.0329148 35.82125 5.15710e-281 1.08739e-280
ENSG00000000460.16 4.40886 0.235817 0.0449273 5.24886 1.53041e-07 1.68755e-07
ENSG00000000938.12 1684.10848 7.874469 0.0523470 150.42842 0.00000e+00 0.00000e+00
... ... ... ... ... ... ...
ENSG00000284237.1 3.55911e-01 -3.075739 0.1297507 -23.70499 3.20315e-124 5.16753e-124
ENSG00000284308.1 1.74974e+01 -0.393802 0.0494972 -7.95603 1.77644e-15 2.06777e-15
ENSG00000284413.1 1.46580e+01 -3.658038 0.0699289 -52.31082 0.00000e+00 0.00000e+00
ENSG00000284484.1 5.86535e+04 -1.767739 0.0949798 -18.61173 2.58128e-77 3.73497e-77
ENSG00000284526.1 9.62264e+00 4.137284 0.0958993 43.14196 0.00000e+00 0.00000e+00
Many thanks.
Typically, the info about the output of funtions is in the function manual, under the "value" heading. For this object, you can see it here:
This is called the "test statistic". It is like a summary of your data, something that is computed from your data, so it can be used to check against a null distribution to see how much your result deviates from what would be expected under the "null hypothesis" (see wiki). When you do this comparison you obtain a p-value which indicates you the probability of having observed your data if the null hypothesis/assumptions were true (e.g. if there were no differences between groups).
Nonetheless, for the filtering/selection of significant genes one typically filters based on p-value and logFC magnitude.
Papyrus thank you very much.
That was a a good answer and it has helped my understanding.
Assuming I have screened out results < 0.05 padj level, what would you think to be a reasonable cut-off range for log2FoldChange value if I wanted similarly expressed genes identified?
Ultimately I will be comparing across 3 different tissue types. Ultimately I plan to compare A==B, A==C, B==C and screen out that way.
Well with these tests one usually focuses on identifying differentially expressed genes, I guess that you could say that the rest of genes are similarly expressed (or rather, you don't have evidence of them being differentially expressed). For the selection of differentially expressed, the logFC depends on what you think is a biologically meaningful change, etc (see answers like this one). If conversely you want to focus on genes with no changes across groups (not differentially expressed), maybe you would want more (less?) conservative and remove more differentially expressed genes, or also look at genes with low variance across all groups, etc.
Papyrus thank you. This is really more useful than you could know!
So if I have 3 tissues, A B C.
and look at LFC for them pairwise by running
And get LFC values (e.g.) of
B-A : -4
C-A: +2
B-C: +6
that might indicate C is being expressed most of all, then B, then A? I have noticed they add up with actual data.
Is my interpretation correct?