Hello,
I am currently analyzing bulk RNAseq data and have clustered my patients into 3 different clusters based on how similar their transcriptomic profile is. For all my samples, I have many different phenotypic labels.
My goal now is to check, if one of the identified clusters is enriched for a certain phenotype (for example being healthy).
My initial idea was to do a simple Fisher test.
As a very concrete example imagine the following scenario:
I have identified 4 different clusters with different numbers of samples in each:
Cluster | Number of samples |
---|---|
1 | 41 |
2 | 32 |
3 | 29 |
4 | 26 |
I am interested if Cluster 1 is enriched for healthy samples. I checked and 13 of the 42 samples in cluster 1 are healthy patients, the rest (28) are unhealthy. In the 3 other clusters combined, there are 10 healthy samples and 77 unhealthy samples. Consequently, if I understand everything correctly the contingency table for my fisher test should look something like this:
13 | 28 |
10 | 77 |
If I want to test for enrichment, I simply call fisher.test(contingency_table, alternative="greater")
. On the other hand, if I want to test for depletion, I call alternative="less"
.
I would very much appreciate it, if someone could confirm if this is indeed the way to go, or if there are more sophisticated and suitable approaches.