How to obtain TPM data after batch effect correction from count data obtained using RSEM
1
0
Entering edit mode
9 days ago
Apprentice ▴ 170

I have raw count data and TPM data for 12 samples that were measured using RNA-seq and quantified using STAR-RSEM pipeline. This data was measured in units of 6 samples, so the data for these 12 samples contains batch effects. I would like to apply cluster analysis to the log2 TPM data for the 12 samples. In this case, I should obtain TPM data that has been corrected for batch effects, then perform log2 conversion and apply cluster analysis. But, what kind of processing should I perform on the raw count or TPM data output from RSEM? I would be grateful for your guidance.

STAR RNA-seq Batch RSEM effect • 352 views
ADD COMMENT
1
Entering edit mode
9 days ago
Gordon Smyth ★ 7.9k

I would be using divided counts for transcript-level analyses from RSEM, and using log-CPM from the divided counts for clustering, for the reasons explained in these papers. There is a catchRSEM() function in edgeR that will read the RSEM output into this pipeline.

  • Baldoni PL, Chen L, Smyth GK (2024). Faster and more accurate assessment of differential transcript expression with Gibbs sampling and edgeR v4. NAR Genomics and Bioinformatics 6(4), lqae151.
  • Baldoni PL, Chen Y, Hediyeh-zadeh S, Liao Y, Dong X, Ritchie ME, Shi W, Smyth GK (2024). Dividing out quantification uncertainty allows efficient assessment of differential transcript expression with edgeR. Nucleic Acids Research 52(3), e13.
ADD COMMENT

Login before adding your answer.

Traffic: 2219 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6