Deciding Between TOIL Processed Data (Xena) vs. TCGA Biolinks Data for DE Analysis
0
0
Entering edit mode
9 days ago
lyan125 ▴ 10

Hello, bioinformatics community

I am currently working on a pan-cancer differential expression (DE) analysis and am undecided about which dataset processing pipeline to use. Here's what I have done so far:

Using TCGA Data from Biolinks:

  1. I downloaded data for multiple cancer types via TCGAbiolinks (R).
  2. Integrated all cancer types and removed batch effects using ComBat-seq.
  3. Performed DE analysis using DESeq2.

Using TOIL Processed Data from Xena:

  1. I downloaded TOIL-processed expected counts data from the Xena hub.
  2. Since the TOIL data is log2(x+1)-transformed, I reversed the transformation and then performed DE analysis with DESeq2.

Observations:

Interestingly, the differentially expressed genes (DGE) identified from TOIL data were closer to the DGE I obtained from CCLE data (via DepMap), which made me lean towards using the TOIL pipeline for my study.

However, something still bothers me, and I am not confident that this choice is correct.

Are there significant downsides to using TOIL-processed Xena data instead of TCGA Biolinks data for DE analysis?

I am looking for insights or validations to ensure my approach is methodologically sound.

Thank you for your help!

DESeq TCGA Xena RNA-Seq BIOLINKS • 351 views
ADD COMMENT

Login before adding your answer.

Traffic: 1330 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6