I searched for many discussions regarding RNA-Seq, many of them used STAR + FeatureCounts for their "standard" analysis and suggested using "TPM" for count data normalisation.
I find hard to believe people were suggesting TPM normalization and using STAR + featureCounts for the same analysis. TPM means "transcripts per million", and is calculated for transcripts - TPM is meant as a within sample normalization, to allow comparing expression of different transcripts within a sample. The combo STAR + featureCounts (or STAR with --quantMode GeneCounts
) is used, in general, to output counts of reads mapping to genes, to perform differential gene expression between samples.
As for your questions:
1) I don't remember seeing a recent independent comparison of STAR and Subread, but I expect both are pretty accurate and will produce similar results. STAR is probably faster, but uses a lot more memory than Subread (although the recent RSubread paper claims RSubread is faster than STAR, I will only believe this once I see an independent comparison).
A few years ago (which means, many, many versions ago), I compared STAR with --quantMode GeneCounts
against Subread + featureCounts - the results were very similar, and the downstream gene expression analysis largely yielded the same results.
2) Depending on the workflow you choose, you may never use TPMs. TPMs would make sense for some visualization analyses (but there are other transformations, for example, edgeR uses CPMs, and DESeq2 uses rlog or vst transformations).
Of course, TPMs make sense for tools designed to be used with TPMs, but as I always used edgeR or DESeq2, I don't have the knowledge to comment further on this.
3) Salmon and kallisto already output TPM (along with estimated transcript counts), these would be more accurate than TPMs estimated with featureCounts output. To estimate TPMs from featureCounts output you either 1) use featureCounts to summarize over transcripts (I don't even know if featureCounts allows this, but, if it does, the counts will be bogus, because featureCounts discards reads mapping to multiple features), or 2) estimate a gene-level TPM from featureCounts results, but then, what length to use? Total gene length? Average of transcript lengths?