Hello All,
I have retrieved the data matrix from TCGA breast invasive carcinoma (BRCA) - expression data, The data is Level_3 Data (file names: *.rsem.genes.normalized_results
) downloaded from TCGA DCC, log2(x+1) transformed, and processed data
In a data matrix, each row represents a feature(Gene name) and each column corresponds to a sample. In breast invasive carcinoma (BRCA) , TCGA possesses 1,215 BRCA patient samples, which have been RNA-sequenced by the Illumina HiSeq2000 system. The recorded sequence data have been processed by the RNA-seq version 2 pipeline that uses the Mapsplice alignment algorithm and the RSEM algorithm to generate expression values. Which are further log2(x+1) transformed, and processed data. The Data Matrix looks as follows :
Genomic Matrix
sample TCGA-A8-A092-01 TCGA-A7-A0CE-11 TCGA-OL-A5D7-01 TCGA-D8-A1JK- TCGA-E2-A10C-01
ARHGEF10L 8.8784 11.977 8.8784 11.977 8.8784
HIF3A 11.977 8.8784 11.977 8.8784 11.977
The data matrix file can be found at https://drive.google.com/file/d/0B4EniZCsdQJ5cEJZSTBCc1htYk0/view?usp=sharing
Please Note: data matrix is ~20,783 Rows * 1215 columns
My question is : The data which is Log2(x+1) transformed, and processed data, how can this be used to Do Differential gene Expression Analysis along with Clinical data?
If yes, then please let me know how to proceed further and what pipeline/software to be used.
Thanks a lot for your kind help
-Ateeq Khaliq
Hi Deepak,
Thanks for your reply. Yes I did separate Control Vs Diseased (Breast Cancer). and also according to the age of the patients.
so what I really wanted to do is finding DGE b/w control Vs BRCA patients and DGE B/W different Age groups.
Since I don't have the infrastructure to download the humongous RAW data, I am only left with one option to deal with processed data. I may sound stupid, but this is the only option left for me. Please help. Thanks a lot.