Question

TCGA data: comparision in normal and tumor samples

1

Entering edit mode

8.5 years ago

Mike ★ 1.9k

Hello All,

I am working on TCGA lung cancer data , I want to compare Average expression of a set of gene (my interested set of gene) in normal and tumor samples. I am wondering that the average expression of these gene in normal and tumor samples are very simialr in normalized log2 data , Fig1, (LUAD.uncv2.mRNAseq_RSEM_normalized_log2.txt), but it is different in normalized Z_score data, Fig2, (LUAD.uncv2.mRNAseq_RSEM_Z_Score.txt).

Fig 1, when using (LUAD.uncv2.mRNAseq_RSEM_normalized_log2.txt) data

Fig 1

Fig 2, when using (LUAD.uncv2.mRNAseq_RSEM_Z_Score.txt) data Fig2

PS: x-axis, same order of genes

So, please help me, which input data should be approprite for this type of comparision.

Thank you...

TCGA • 3.5k views

ADD COMMENT • link updated 8.5 years ago by Shicheng Guo ★ 9.5k • written 8.5 years ago by Mike ★ 1.9k

2

Entering edit mode

Why the Z-score for the primary tumor become so small and always nearby zero? how did you pre-process the data? You need share the data and code with dropbox or link so that you can get more suggestions. Usually, majority stuff wil use Figure 1, I think

ADD REPLY • link 8.5 years ago by Shicheng Guo ★ 9.5k

0

Entering edit mode

Thanks Shicheng,

I used pre-processed data from Broad GDAC Firehose (https://gdac.broadinstitute.org/), I didnt normalized data, I just downloaded preprocessed file "LUAD.uncv2.mRNAseq_RSEM_Z_Score.txt, =matrix 576 * 20501)", then extract subset for my gene list (576 * 88), again sub-divided into primary (matrix size= 515 * 88) & normal samples(59 * 88), finally calculate mean expression of each gene in both class separately and plotted.

ADD REPLY • link 8.5 years ago by Mike ★ 1.9k

0

Entering edit mode

Although I tried hard to find the file you mentioned, I can not find it in Firehose database. I don't know why. But anyway, maybe I have guessed why you will get this problem. http://gdac.broadinstitute.org/runs/stddata__2016_01_28/data/LUAD/20160128/

ADD REPLY • link 8.5 years ago by Shicheng Guo ★ 9.5k

score 4 · Accepted Answer · 2016-06-06

4

Entering edit mode

8.5 years ago

Shicheng Guo ★ 9.5k

Here, Z_Score means:

Z_Score = (expression in single tumor sample) - (mean expression in all tumor samples ) / (standard deviation of expression in all tumor samples)

That's why the Z-score for cancer group is very small in your Figure 2.

And I think for the calculation of Z score for normal samle is something like this way:

Z_Score = (expression in single normal sample) - (mean expression in all normal samples ) / (standard deviation of expression in all normal samples)

That means this curve only show the fluctuation of the gene expression in that group

I am pretty sure that you should use the data in Figure 1.

ADD COMMENT • link 8.5 years ago by Shicheng Guo ★ 9.5k

0

Entering edit mode

Thanks again ,

Im using preprocessed data from https://gdac.broadinstitute.org/ ( http://firebrowse.org/?cohort=LUAD ) So I should use the normalized log2 data (data in Figure 1)

ADD REPLY • link 8.5 years ago by Mike ★ 1.9k

0

Entering edit mode

Hello, I'm using level 3 normalized data from GDAC Firehose, I have question reg. Z-score calculation in tumor sample alone. so, to calculate Z-score of a gene (X) in a tumor sample, how to calculate the mean & std.dev of reference population

Z_Score = (expression in X in tumor sample (s)) - (mean expression of X in all tumor samples(population) / (standard deviation of X's expression in all tumor samples) or

Z_Score = (expression in gene X in tumor sample (s)) - (mean expression of all genes (+20K) in all tumor samples(population) / (standard deviation of all genes (+20K) expression in all tumor samples)

which formula should i consider ?

Thanks, sumithra

ADD REPLY • link 7.8 years ago by sumithra.das ▴ 10