Question

Genomic Matrix data from TCGA need to be analysied for Differential gene expression

1

Entering edit mode

9.9 years ago

David_emir ▴ 500

Hello All,

I have retrieved the data matrix from TCGA breast invasive carcinoma (BRCA) - expression data, The data is Level_3 Data (file names: *.rsem.genes.normalized_results) downloaded from TCGA DCC, log2(x+1) transformed, and processed data

In a data matrix, each row represents a feature(Gene name) and each column corresponds to a sample. In breast invasive carcinoma (BRCA) , TCGA possesses 1,215 BRCA patient samples, which have been RNA-sequenced by the Illumina HiSeq2000 system. The recorded sequence data have been processed by the RNA-seq version 2 pipeline that uses the Mapsplice alignment algorithm and the RSEM algorithm to generate expression values. Which are further log2(x+1) transformed, and processed data. The Data Matrix looks as follows :

Genomic Matrix

sample      TCGA-A8-A092-01   TCGA-A7-A0CE-11   TCGA-OL-A5D7-01   TCGA-D8-A1JK-   TCGA-E2-A10C-01
ARHGEF10L   8.8784            11.977            8.8784            11.977          8.8784
HIF3A       11.977            8.8784            11.977            8.8784          11.977

The data matrix file can be found at https://drive.google.com/file/d/0B4EniZCsdQJ5cEJZSTBCc1htYk0/view?usp=sharing

Please Note: data matrix is ~20,783 Rows * 1215 columns

My question is : The data which is Log2(x+1) transformed, and processed data, how can this be used to Do Differential gene Expression Analysis along with Clinical data?

If yes, then please let me know how to proceed further and what pipeline/software to be used.

Thanks a lot for your kind help

-Ateeq Khaliq

genomicMatrix DGE TCGA • 5.7k views

ADD COMMENT • link updated 2.2 years ago by Ram 45k • written 9.9 years ago by David_emir ▴ 500

Ram · Answer 1 · 2015-05-06

5

Entering edit mode

9.9 years ago

Deepak Tanwar ★ 4.2k

There would be already patients samples and controls in BRCA data. Did you separate out that?

This entirely depends on what kind of Clinical analysis you want to integrate.

Do you want to check differential gene expression between patient status?

Elaborate:: Differential gene Expression Analysis along with Clinical data?

ADD COMMENT • link updated 2.2 years ago by Ram 45k • written 9.9 years ago by Deepak Tanwar ★ 4.2k

0

Entering edit mode

Hi Deepak,

Thanks for your reply. Yes I did separate Control Vs Diseased (Breast Cancer). and also according to the age of the patients.

so what I really wanted to do is finding DGE b/w control Vs BRCA patients and DGE B/W different Age groups.

Since I don't have the infrastructure to download the humongous RAW data, I am only left with one option to deal with processed data. I may sound stupid, but this is the only option left for me. Please help. Thanks a lot.

ADD REPLY • link updated 2.2 years ago by Ram 45k • written 9.9 years ago by David_emir ▴ 500

score 5 · Answer 2 · 2015-05-08

5

Entering edit mode

9.9 years ago

Deepak Tanwar ★ 4.2k

HI Atheeq,

You could find the DEG's b/w groups by applying t-test, wilcoxon test. You could also do a Log Fold Change.

ADD COMMENT • link 9.9 years ago by Deepak Tanwar ★ 4.2k

0

Entering edit mode

Can you please let me know how to go about...any packages any tools available for analysing.thanks a ton Deepak

ADD REPLY • link 9.9 years ago by David_emir ▴ 500

1

Entering edit mode

If you are using R, type following in R for the help:

?t.test

?wilcoxon.test

ADD REPLY • link 9.9 years ago by Deepak Tanwar ★ 4.2k

Ram · Answer 3 · 2016-06-22

0

Entering edit mode

8.8 years ago

elizabethR ▴ 70

I've been told by a bioinformatician to use EdgeR to do differential expression analysis. However as I understand it this data cannot be normalised, has to be raw counts (i.e. rsem.genes.results files rather than the normalised files because edgeR normalises it as part of its mathematical modelling algorithm

ADD COMMENT • link updated 2.2 years ago by Ram 45k • written 8.8 years ago by elizabethR ▴ 70

1

Entering edit mode

You can always have 2 options:

You normalized data (counts) with edgeR package. This you may have done it.
You normalize data (Upper Quartile Normalization) and then just calculate Differential Expression using edgeR.