Question

Deciding Between TOIL Processed Data (Xena) vs. TCGA Biolinks Data for DE Analysis

0

Entering edit mode

9 days ago

lyan125 ▴ 10

Hello, bioinformatics community

I am currently working on a pan-cancer differential expression (DE) analysis and am undecided about which dataset processing pipeline to use. Here's what I have done so far:

Using TCGA Data from Biolinks:

I downloaded data for multiple cancer types via TCGAbiolinks (R).
Integrated all cancer types and removed batch effects using ComBat-seq.
Performed DE analysis using DESeq2.

Using TOIL Processed Data from Xena:

I downloaded TOIL-processed expected counts data from the Xena hub.
Since the TOIL data is log2(x+1)-transformed, I reversed the transformation and then performed DE analysis with DESeq2.

Observations:

Interestingly, the differentially expressed genes (DGE) identified from TOIL data were closer to the DGE I obtained from CCLE data (via DepMap), which made me lean towards using the TOIL pipeline for my study.

However, something still bothers me, and I am not confident that this choice is correct.

Are there significant downsides to using TOIL-processed Xena data instead of TCGA Biolinks data for DE analysis?

I am looking for insights or validations to ensure my approach is methodologically sound.

Thank you for your help!

DESeq TCGA Xena RNA-Seq BIOLINKS • 351 views

ADD COMMENT • link 5 days ago by lyan125 ▴ 10