Hello all!
I am very new to R and DEG analyses, so please bear with me...
I normalized a set of raw counts using DESeq2. To do this, I also fed in metadata which distinguished my samples by "Patient". In other words, the only factor which distinguished the columns (samples) of my raw counts matrix was the patient they were sourced from (n=5).
From there (after some preparatory filtering, sorting and data mining), I performed KEGG GSEA by employing clusterProfiler with the l2fc data obtained from results(dds, tidy = TRUE). In doing so, I have successfully returned a gseaResult object comprised of upregulated and downregulated KEGG pathways (provided by $NES).
HOWEVER, this gseaResult object does NOT provide gene-specific NES for each "Patient". Rather, a single NES value is reported for each gene. This is understandable because the l2fc input is not distinguished by sourced metadata, nor do the dds results distinguish l2fc by fed metadata.
My ultimate goal is to generate a heatmap of gene NES values representing EACH patient (n=5). I presume this is possible... but I am having trouble seeing it through... Any help is much appreciated!
Please let me know if I can provide any further clarifying information.
Thank you... I will certainly check GSVA out.
On the topic of GSEA applied to a single, non-binary factor though, may I ask what exactly is the significance of +NES and -NES values?
To provide further clarity, the output of resultsNames(dds) is as follows:
[1] "Intercept" "Patient_Patient_1_vs_Patient_2" [3] "Patient_Patient_3_vs_Patient_2" "Patient_Patient_4_vs_Patient_2" [5] "Patient_Patient_5_vs_Patient_2"
I presume a significant KEGG pathway GSEA result with a positive NES value in this context means that its controlling gene set reported higher l2fc values... and this pathway is thereby deemed upregulated among the provided samples. However, this GSEA result says nada regarding whether say Patient 1 upregulates or downregulates X pathway. In my head, it does, however, suggest that the provided patient groups evidence differential regulation of the significant KEGG pathway GSEA results (because the contributing/controlling gene sets were previously deemed differentially expressed between these same patient groups by the preceding DESeq2 analysis).
The only thing I cannot wrap my head around is how to report an upregulated KEGG pathway GSEA result which represents input data from 5 "conditions"... is X pathway just reportedly upregulated more times than not in the sample population? If so, would one report along the lines of "X pathway reports mean positive NES values among the selected population BUT it is differentially enriched"?
Thank you, again, for any insights!
One thing I didn't make clear in my answer is that
GSVA
is performed independent of differential gene expression results, i.e. you use your counts matrix as input, not DESeq2 results.While I'm not sure I understand quite what you are asking, it does seems to me that one issue you are having is that there doesn't appear to be a "control" patient sample which can be used as a reference for comparing the gene expression of the 5 patient samples that you discuss here.
Short answer: i don't think this is feasible given your sample meta/experimental design b/c of the lack of control sample. I believe
GSVA
combined with sample clustering will be more useful to you in trying to answer the above question than GSEA. The idea withGSVA
is that you calculate pathway enrichment scores for each sample then you can ask questions like "which pathways are positively or negatively enriched in all 5 patient samples?".Regarding "differential enrichment" using
GSVA
one can uselimma
methods to ask what pathways show a statistically significant increase or decrease inGSVA
enrichment scores between two samples. An example is provided in theGSVA
documentation.Ah… thank you very much for the clarification, it was much needed.
So, to iterate: feeding differential gene expression results into GSEA only makes sense if these DEGs were obtained against a control. Expanding on this, if I did have a control variable to compare against each non-control sample characteristic, could I expect my dds object to have a column of l2fc values for EACH characteristic-control comparison? In other words, say I had 5 patients with X tumor and 1 control patient, would my DEG results from DESeq contain 5 separate l2fc columns? (Which I could then feed into GSEA to obtain 5 distinct GSEA results)
I apologize if this too is a silly question!
Not necessarily, it all depends on what questions you are trying to answer with this experiment. I only noted the idea of a control sample since it seems like you are interested in profiling the commonalities of these 5 samples as opposed to directly contrasting them in one-to-one differential expression analyses.
You would have 5 separate DESeq2
results
tables, on for each patient OR if your 5 samples were replicates of a specific condition then you would just have 1results
containing the control vs treatment contrast.Perfect, thank you very much for the above!