Question

Performing differential expression analysis after applying transformations on my data

0

Entering edit mode

2.0 years ago

AlexStar ▴ 200

I possess RNA-seq data that's TPM normalized, sourced from different origins. I merged these datasets and then applied log2 transformation followed by batch effect correction. These steps ensured that all samples approximated a similar range, making them crucial for consistency.

While I understand that differential expression analysis is typically done on raw counts, I don't have that data. While Limma's voom approach works for normalized data, is it still applicable after log2 transformation and batch effect correction?

Conducting differential expression analysis on my merged TPM data without these transformations might not yield accurate results due to the discrepancies in some sample values. What's the recommended approach?

normalization differential-expression r • 2.0k views

ADD COMMENT • link 2.0 years ago by AlexStar ▴ 200

0

Entering edit mode

While Limma's voom approach works for normalized data, is it still applicable after log2 transformation and batch effect correction?

The recommended approach is to use batch as a covariate, not perform DE on batch-corrected data especially TPM which is inherently incomparable.

ADD REPLY • link 2.0 years ago by Ram 45k

0

Entering edit mode

I see, so currently my design looks like this (Benefit has two values only):

design = model.matrix(~ 0 + Benefit, meta_for_train)

I'm removing batch effect for cancer type (I have 4 cancer types). You're saying that my design should be like this ?

design = model.matrix(~ 0 + Benefit + Cancer_Type, meta_for_train)

ADD REPLY • link 2.0 years ago by AlexStar ▴ 200

0

Entering edit mode

Your batch variable is Cancer Type?

ADD REPLY • link 2.0 years ago by Ram 45k

0

Entering edit mode

Yes, it's a pan-cancer study

ADD REPLY • link 2.0 years ago by AlexStar ▴ 200

0

Entering edit mode

Why are you batch correcting data where the batch is such a critical biological variable? That makes no sense.

ADD REPLY • link 2.0 years ago by Ram 45k

0

Entering edit mode

Because I want my model to be able to classify response, regardless of cancer type or anatomical location.. I can basically use the cancer type as a predictive feature, you're right this can be an important variable so it's an option.

ADD REPLY • link 2.0 years ago by AlexStar ▴ 200

0

Entering edit mode

You seem to have a really nice machine learning background but not a great cancer background. Cancer is heterogeneous even within a specific cancer type, how do you expect your model to classify response just based on horribly mangled generic RNA-seq data? We use multi-omics on highly specific cancer subtypes and our models are not all that amazing, I don't see how removing critical biological information is going to give you anything better than a crapshoot.

ADD REPLY • link 2.0 years ago by Ram 45k

0

Entering edit mode

Are you suggesting that I should disregard batch correction for the cancer type and instead incorporate it as a predictive feature in my model? Additionally, I have other variables like gender, treatment type, and outputs from various deconvolution algorithms indicating cell abundance for each sample. So I use those cells as predictive features as well.

ADD REPLY • link 2.0 years ago by AlexStar ▴ 200

0

Entering edit mode

I don't know machine learning, so I can't speak to "incorporate it as a predictive feature". My point is that treating cancer type as a mere batch variable will result in immense loss of context. Given how narrow your data is, such a broad question will not work in your favor. But I'm no expert on machine learning so you might stumble upon something. I'd recommend you consult some folks that have experience in cancer RNA-seq and make sure you understand what you're expecting from your data.

ADD REPLY • link 2.0 years ago by Ram 45k

0

Entering edit mode

I see. You're right, we decided not to correct for cancer type, and to use it in the predicting process. Thanks for the help!

ADD REPLY • link 2.0 years ago by AlexStar ▴ 200

score 0 · Answer 1 · 2023-08-10

It will be a valid approach to analyze your individual data separately for DE and subsequently perform meta-analysis of your DE results using any of following methods-