I may be completely wrong here, but this is my guess:
It looks like they maybe did do a differential expression analysis. In the Supplemental-2, which I think you were reading as well. It says: "The 1,500 most variably expressed genes were selected and used for
consensus average linkage hierarchical clustering (GDAC Firehose AWG,
http://gdac.broadinstitute.org/runs/awg_skcm__2014_02_23)"
I am guessing they maybe did differential gene expression analysis using some tool (that they didn't mention), and then arranged by p value from least to greatest and then used the top 1,500 (with lowest p-value at the top and greatest p-value at the bottom). They called these "variably" expressed genes which I imagine they meant "differentially" expressed genes.
Then they did hierarchical clustering I am guessing using the RSEM count data (log transformed and median centered) of just these 1,500 variably/differentially expressed genes. They probably "sliced" the hierarchical cluster at some level which allowed them to "capture the genes" for those distinct looking clusters on the heatmap, and then added the hierarchical clustering to a heatmap from ComplexHeatmap
to make the initial heatmap (see here for a question on StackOverflow that does this: https://stackoverflow.com/questions/77085300/getting-different-hierarchical-clustering-in-complexheatmap-for-the-same-method )
Then, I am guessing, they ran those genes in groups into some sort of gene enrichment database/tool to get the annotations for gene functions, and then finally, labeled them using maybe a variation of this: https://jokergoo.github.io/ComplexHeatmap-reference/book/heatmap-annotations.html#draw-textbox-annotation
Thank you for the ideas!
Though I think there was no differential expression analysis, because there is no normal samples in the dataset, There are only tumor samples so as I understand it, there is nothing to compare them to...