Batch Effect

0

Entering edit mode

3.5 years ago

pinheirofabiano ▴ 100

I want to compare 7 RNA-seq datasets from patients with pancreatic cancer with 7 samples of normal pancreatic tissue.

The cancer samples and the controls, however, are not from the same patients.

I would like to be sure my results will be trustful, so I'm concerned about batch effect.

How can I identify batch effect in my analysis and how can I correct it?

I want to perform gene co-expression analysis.

thank you very much,

Fabiano

r RNA-seq • 1.1k views

ADD COMMENT • link updated 3.5 years ago by Matina ▴ 250 • written 3.5 years ago by pinheirofabiano ▴ 100

1

Entering edit mode

Hi,

since the data are from TCGA I would first run a PCA analysis to inspect sample clustering.

What kind of information does TCGA give you for these data?

If you have batch information you have 2 options,

correct the gene expression values using a package like combat,
add batch information as a covariate during differential expression analysis.

ADD REPLY • link 3.5 years ago by Matina ▴ 250

0

Entering edit mode

Are the samples (cancer vs control) from the same study or is this completely different sources? If the latter you cannot correct for it.

ADD REPLY • link 3.5 years ago by ATpoint 85k

0

Entering edit mode

All the samples are from the TCGA.

ADD REPLY • link 3.5 years ago by pinheirofabiano ▴ 100

1

Entering edit mode

Should be fine then I guess. Perform PCA (e.g. check PCAtools package at Bioconductor) and see how samples cluster, this can identify potential batch effects. For this one would use transformed data, such as vst from DESeq2 or the normalized counts on the log2 scale.

ADD REPLY • link 3.5 years ago by ATpoint 85k

Login before adding your answer.