Genomic Matrix data from TCGA need to be analysied for Differential gene expression
3
1
Entering edit mode
9.6 years ago
David_emir ▴ 500

Hello All,

I have retrieved the data matrix from TCGA breast invasive carcinoma (BRCA) - expression data, The data is Level_3 Data (file names: *.rsem.genes.normalized_results) downloaded from TCGA DCC, log2(x+1) transformed, and processed data

In a data matrix, each row represents a feature(Gene name) and each column corresponds to a sample. In breast invasive carcinoma (BRCA) , TCGA possesses 1,215 BRCA patient samples, which have been RNA-sequenced by the Illumina HiSeq2000 system. The recorded sequence data have been processed by the RNA-seq version 2 pipeline that uses the Mapsplice alignment algorithm and the RSEM algorithm to generate expression values. Which are further log2(x+1) transformed, and processed data. The Data Matrix looks as follows :

Genomic Matrix

sample      TCGA-A8-A092-01   TCGA-A7-A0CE-11   TCGA-OL-A5D7-01   TCGA-D8-A1JK-   TCGA-E2-A10C-01
ARHGEF10L   8.8784            11.977            8.8784            11.977          8.8784
HIF3A       11.977            8.8784            11.977            8.8784          11.977

The data matrix file can be found at https://drive.google.com/file/d/0B4EniZCsdQJ5cEJZSTBCc1htYk0/view?usp=sharing

Please Note: data matrix is ~20,783 Rows * 1215 columns

My question is : The data which is Log2(x+1) transformed, and processed data, how can this be used to Do Differential gene Expression Analysis along with Clinical data?

If yes, then please let me know how to proceed further and what pipeline/software to be used.

Thanks a lot for your kind help

-Ateeq Khaliq

genomicMatrix DGE TCGA • 5.5k views
ADD COMMENT
5
Entering edit mode
9.6 years ago
Deepak Tanwar ★ 4.2k

There would be already patients samples and controls in BRCA data. Did you separate out that?

This entirely depends on what kind of Clinical analysis you want to integrate.

Do you want to check differential gene expression between patient status?

Elaborate:: Differential gene Expression Analysis along with Clinical data?

ADD COMMENT
0
Entering edit mode

Hi Deepak,

Thanks for your reply. Yes I did separate Control Vs Diseased (Breast Cancer). and also according to the age of the patients.

so what I really wanted to do is finding DGE b/w control Vs BRCA patients and DGE B/W different Age groups.

Since I don't have the infrastructure to download the humongous RAW data, I am only left with one option to deal with processed data. I may sound stupid, but this is the only option left for me. Please help. Thanks a lot.

ADD REPLY
5
Entering edit mode
9.6 years ago
Deepak Tanwar ★ 4.2k

HI Atheeq,

You could find the DEG's b/w groups by applying t-test, wilcoxon test. You could also do a Log Fold Change.

ADD COMMENT
0
Entering edit mode
Can you please let me know how to go about...any packages any tools available for analysing.thanks a ton Deepak
ADD REPLY
1
Entering edit mode

If you are using R, type following in R for the help:

?t.test

?wilcoxon.test

ADD REPLY
0
Entering edit mode
8.4 years ago
elizabethR ▴ 70

I've been told by a bioinformatician to use EdgeR to do differential expression analysis. However as I understand it this data cannot be normalised, has to be raw counts (i.e. rsem.genes.results files rather than the normalised files because edgeR normalises it as part of its mathematical modelling algorithm

ADD COMMENT
1
Entering edit mode

You can always have 2 options:

  1. You normalized data (counts) with edgeR package. This you may have done it.
  2. You normalize data (Upper Quartile Normalization) and then just calculate Differential Expression using edgeR.
ADD REPLY
0
Entering edit mode

you might be better using limma::voom for a dataset of this magnitude

ADD REPLY

Login before adding your answer.

Traffic: 1648 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6