Question

How to obtain TPM data after batch effect correction from count data obtained using RSEM

0

Entering edit mode

9 days ago

Apprentice ▴ 170

I have raw count data and TPM data for 12 samples that were measured using RNA-seq and quantified using STAR-RSEM pipeline. This data was measured in units of 6 samples, so the data for these 12 samples contains batch effects. I would like to apply cluster analysis to the log2 TPM data for the 12 samples. In this case, I should obtain TPM data that has been corrected for batch effects, then perform log2 conversion and apply cluster analysis. But, what kind of processing should I perform on the raw count or TPM data output from RSEM? I would be grateful for your guidance.

STAR RNA-seq Batch RSEM effect • 352 views

ADD COMMENT • link updated 9 days ago by Gordon Smyth ★ 7.9k • written 9 days ago by Apprentice ▴ 170

score 1 · Answer 1 · 2025-04-04

I would be using divided counts for transcript-level analyses from RSEM, and using log-CPM from the divided counts for clustering, for the reasons explained in these papers. There is a catchRSEM() function in edgeR that will read the RSEM output into this pipeline.

Baldoni PL, Chen L, Smyth GK (2024). Faster and more accurate assessment of differential transcript expression with Gibbs sampling and edgeR v4. NAR Genomics and Bioinformatics 6(4), lqae151.
Baldoni PL, Chen Y, Hediyeh-zadeh S, Liao Y, Dong X, Ritchie ME, Shi W, Smyth GK (2024). Dividing out quantification uncertainty allows efficient assessment of differential transcript expression with edgeR. Nucleic Acids Research 52(3), e13.