Hello All,
I am working on TCGA lung cancer data , I want to compare Average expression of a set of gene (my interested set of gene) in normal and tumor samples. I am wondering that the average expression of these gene in normal and tumor samples are very simialr in normalized log2 data , Fig1, (LUAD.uncv2.mRNAseq_RSEM_normalized_log2.txt), but it is different in normalized Z_score data, Fig2, (LUAD.uncv2.mRNAseq_RSEM_Z_Score.txt).
Fig 1, when using (LUAD.uncv2.mRNAseq_RSEM_normalized_log2.txt) data
Fig 2, when using (LUAD.uncv2.mRNAseq_RSEM_Z_Score.txt) data
PS: x-axis, same order of genes
So, please help me, which input data should be approprite for this type of comparision.
Thank you...
Why the Z-score for the primary tumor become so small and always nearby zero? how did you pre-process the data? You need share the data and code with dropbox or link so that you can get more suggestions. Usually, majority stuff wil use Figure 1, I think
Thanks Shicheng,
I used pre-processed data from Broad GDAC Firehose (https://gdac.broadinstitute.org/), I didnt normalized data, I just downloaded preprocessed file "LUAD.uncv2.mRNAseq_RSEM_Z_Score.txt, =matrix 576 * 20501)", then extract subset for my gene list (576 * 88), again sub-divided into primary (matrix size= 515 * 88) & normal samples(59 * 88), finally calculate mean expression of each gene in both class separately and plotted.
Although I tried hard to find the file you mentioned, I can not find it in Firehose database. I don't know why. But anyway, maybe I have guessed why you will get this problem. http://gdac.broadinstitute.org/runs/stddata__2016_01_28/data/LUAD/20160128/