Entering edit mode
3.5 years ago
pinheirofabiano
▴
100
I want to compare 7 RNA-seq datasets from patients with pancreatic cancer with 7 samples of normal pancreatic tissue.
The cancer samples and the controls, however, are not from the same patients.
I would like to be sure my results will be trustful, so I'm concerned about batch effect.
How can I identify batch effect in my analysis and how can I correct it?
I want to perform gene co-expression analysis.
thank you very much,
Fabiano
Hi,
since the data are from TCGA I would first run a PCA analysis to inspect sample clustering.
What kind of information does TCGA give you for these data?
If you have batch information you have 2 options,
Are the samples (cancer vs control) from the same study or is this completely different sources? If the latter you cannot correct for it.
All the samples are from the TCGA.
Should be fine then I guess. Perform PCA (e.g. check PCAtools package at Bioconductor) and see how samples cluster, this can identify potential batch effects. For this one would use transformed data, such as
vst
from DESeq2 or the normalized counts on the log2 scale.