Hi,
I try to analyze single sample gene set enrichment analysis on my data which is RNA-seq gene expression. I'd like to noticed that my data has got continous values not discrete or count.
After GSVA, number of the results I got were almost 400 gene sets with enrichment scores. Now, I am a bit confused whether I am on the right way of analyzing of single sample gene set enrichment for that data until this step. Also, I have searched some article about NES score but it was not clear for me. How can I interpret my results on these scores If I draw networks between pathways ? Like, some pathway are on/off for normal and disease. Because I do not think I will interpret it as up- down- regulated.
My second question, Is there any limitation (top and bottom) between these enrichment scores that are negatives and positives.
Also, I wonder the library of singscore but It is a bit complex and different with GSVA. Is there anyone who has used it for like this purpose before ? Thank you very much in advance.
Hi Kevin, thank you for reply. I had thougt that like you said about up- and down-regulated in values. However, there were some biases in my result. That's why I am a bit confused. Like, the pathway which belongs to cancer appears up-regulated for all control samples. I did not expect this. After this, I wanted to check the values in the matrix then I realized the value of control samples were really close to the value of cancer samples for this specific pathway.
Also, Do you know whether there is any limitation in enrichment scores for positive and negative ? For example, Can we say that positive values are between 0 and +1 or something like that?
Hey, there is no specific cut-off / limit / threshold for positive / negative (activated / not activated). How did your run GSVA, I mean, which
method
did you choose?Sometimes, after I run GSVA, I then convert my output data matrix to Z-scores, and then it can be easier to select the gene signatures / pathways that are statistically significantly enriched. Z>1.96 and Z<-1.96 are equivalent to p<0.05.
Regarding the false-positive enrichment for cancer, these false positive associations occur 'all of the time' and are an unfortunate consequence of the fact that many disease pathways overlap with normal biological function.
I Got it. Therefore, I can't apply any thresholds on it. I used ssgsea as method but In the GSVA article, there is no significant difference between gsva and ssgsea, I think. If it really depends on the method I chose, I can change the method but I don't think so the result will change.
Z-scores makes sense, I can also try it. But I guess, there is no way which avoid these biases. I mean, overlapping pathways with control samples. Because this causes bias on the results.