Hi,
The normalization should be performed by the tool you are using (the most popular being EdgeR, DESeq2 and limma), each one of them has a different way of normalizing the data, but if your data is robust (one of the important thing is having enough replicates), they should give similar results,
If you are using Trinity, there is a script called "run_DE_analysis.pl" which will perform the normalization (using EdgeR, DESeq or limma as you choose) and pairwise comparisons among each of your sample. To know how to run it you can just follow this trinity tutorial : https://github.com/trinityrnaseq/trinityrnaseq/wiki/Trinity-Differential-Expression. As you can read on this page, this script is asking for a "matrix of raw read counts (not normalized!)". This tutorial explain every step (including drawing heatmaps).
Now, if you want information on how FPKM, RPKM and TPM work, I find this video useful (and by the way all the videos from StatQuest are good): https://www.youtube.com/watch?time_continue=608&v=TTUrtCY2k-w basically FPKM, RPKM and TPM normalize by library size (sequencing depth) and transcripts length, which should be enough if all your samples come from the same tissue.
I do not know a lot about TMM but as I understood it, it also adjusts for library composition. Meaning that it is useful if you want to compare different tissues, indeed if a gene is heavily expressed in one tissue and not the other, it will "absorb" most of the reads and the other genes will seems less expressed. Here is a video explaining how DESeq2 normalize data :
So in the end it depends on your experiment / data type.
Corentin
I think VST counts from DESeq2 might be a good choice (seq depth+composition bias correction) for heatmaps and MDS. But I think VST is not controlling for gene length. I am not sure if it is possible to get length normalised VST.