Question

Normalization of RNA-seq data across tumors. Should I pool the samples?

0

Entering edit mode

3.5 years ago

BioNovice247 ▴ 20

Hi all,

I want to run a correlation analysis between matched TCGA mRNA and miRNA samples from multiple projects. Specifically, I want to run the analysis once for every tumor type (e.g. TCGA LUAD) independent of others and then run the analysis once again across tumor types. I was wondering if it is better to first pool all the samples from the various tumor types together, normalize the data using counts from all the samples, and then subset this large count matrix (which includes samples from all the tumor types) to separate matrices that only include samples from a single tumor type and run the analysis on them or is it better to first construct tumor-specific matrices and normalize each matrix separately.

In short, is it more appropriate to normalize the sample using the complete set of data (which includes non-homogenous data from different cancer types) and then run the analysis on different portions (tumor types) or is it better if each portion is normalized independent of the others? I am using VST normalization btw.

Thanks in advance for your time

DESeq2 TCGA RNA-seq VST Normalization • 1.1k views

ADD COMMENT • link 3.5 years ago by BioNovice247 ▴ 20

0

Entering edit mode

Do you expect these different tumours to be different? Is there a compositional difference between them? If so, you might be removing biological differences between different tumour samples.

ADD REPLY • link 3.5 years ago by Igor ▴ 50

0

Entering edit mode

Hi Igor. These tumor types represent different entities as far as the tumors go. However, my analysis is less concerned with differences between tumor types and more concerned with associations of different molecules with each other within and across tumor types

ADD REPLY • link 3.5 years ago by BioNovice247 ▴ 20