Do I need to normalize level 3 TCGA data?
2
6
Entering edit mode
10.2 years ago

Hi. Recently I'm dealing with TCGA microarray data. According to the description, the level 3 data has already been normalized. However, when I boxplot the expression matrix, it seems that many of the chips (which are all from one platform) are quite different in quantiles. My question is should I normalize and/or log2 transform them?

TCGA microarray • 14k views
ADD COMMENT
0
Entering edit mode

Level 3 microarray data should be normalized. Which microarray is this? Have you looked at whether these differences correlate to tumor type, or subtype? Depending on the platform, these might be real genome-wide effects.

ADD REPLY
0
Entering edit mode

It's U133A. They are all from GBM tumors (no control). For subtypes, this platform lacks this information in clinical data.

ADD REPLY
3
Entering edit mode
10.2 years ago

Normalization happens at Level 2 as explained here. So Level 3 TCGA data should be post-normalization and in a format more suitable for making interpretations. But different tumor-specific working groups may do the job differently. GBM was one of the earliest TCGA projects where a lot of lessons were yet to be learned - like abandoning U133A for RNA-seq based expression data. ;)

In general, you should read the method's sections in the tumor-specific marker papers. For TCGA GBM, the supplement here explains the "Creation of a unified Expression Dataset", but it's not clear whether this needed to be done on top of the Level 3 data.

ADD COMMENT
0
Entering edit mode

I see. Should I use RNAseq instead? But seems that RNAseq has fewer participants than microarray.

ADD REPLY
0
Entering edit mode

Yes, I would recommend RNA-seq RSEMs, but I'm fairly certain that RNA-seq was not done for TCGA GBM. There was a lot of work done in this manuscript to make the most of the microarray data.

ADD REPLY
0
Entering edit mode

Hi! I am actually looking at TCGA level3 RNASeqV2 data. My goal is to look at the DEGs (tumor vs. normal) and I'm looking at LUAD now.

I am using edgeR at the moment since the original rsem paper mentioned that those rsem can be processed by edgeR.

I was wondering if it makes sense to include all the tumor samples available, including those that don't have the matching normal samples from the same participant, and analyze for the DEGs? What kind of normalization method would be recommended if I do so? Or can I just use the edgeR default normalization?

ADD REPLY
0
Entering edit mode

I don't have a good answer for you. You should post a new question on Biostars. Do this in general, if your question is even slightly unrelated.

ADD REPLY
0
Entering edit mode
9.7 years ago
nnabavi • 0

Hello! I have a similar question, are the normalized values of TCGA level3 RNAseqV2 data already normalized to the normal adjacent tissue or are they representing tumor tissue? The data I downloaded falls under the TN blue category meaning it's Tumor/Matched Normal. Also would the unit of measurement be Intensity/RPKM/FPKM or fold-change? Thanks for any help!

ADD COMMENT
0
Entering edit mode

Hi nnabavi and welcome, an important point: this should be a new question, not an answer to a different question

Also, read up some info here

If still in doubt then post a new question by using the "moderate -> delete" links and creating a new question. this will allow the Biostar gurus to be able to help you much better :)

ADD REPLY

Login before adding your answer.

Traffic: 1812 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6