Batch correction of TCGA data from TCGAbiolinks and cBioPortal
1
1
Entering edit mode
15 months ago
Petesview ▴ 10

Hi,

I have a specific question regarding whether STAR counts of TCGA-BRCA data downloaded from GDCquery is batch corrected. If not, I would like to correct this batch before performing differential expression analysis with Voom. An alternative is to analyse this dataset on cBioPortal. I know that cBioPortal has performed batch correction on some cancer types such as PRAD, UCEC, COAD, and READ, however, it did not explicitly mention whether BRCA data was also corrected.

Finally, if there is a need to correct for the batch effect in TCGA-BRCA, may I ask how can I do that, since I am relatively new to the area? Thanks for your help.

RNA-seq • 948 views
ADD COMMENT
1
Entering edit mode
15 months ago
Zhenyu Zhang ★ 1.2k

The answer is no. The reason is that automatic batch correction is easy to be done incorrectly.

  • Example: I want to study the difference between LUAD and LUSC. I gathered the data and performed an automatic batch correction. Result? difference between LUAD and LUSC are considered the top batch effect, and were removed.
  • How to do this correctly? I suspect multiple properties might have batch effect. For the first one, I tested shipment batch. I did a lot of plot and found 2 of the particular sample shipping date results in difference in RNA-Seq expression. I check the genes involved and that suggest me some degradation. I then decided to correct batch effect for this two shipment date.

If you have to use automatic batch corrected RNA-Seq data from TCGA, I suggest you to check TCGA batch correction done by MD Anderson. People have generally higher evaluation of what that team has done.

ADD COMMENT

Login before adding your answer.

Traffic: 2473 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6