TCGA data analysis, raw_count?!
1
0
Entering edit mode
9.5 years ago
juara ▴ 40

Hello

I would appreciate if you could help me analyzing the TCGA data. What I have done so far:

  • Download PRAD (prostate) RNAseqv2 data consisting of 550 patients
  • Download Clinical Data for PRAD
  • Match these in Excel using the barcode

Now my question is if I should use the "raw_count" or "scaled_estimate" for my analysis. For example, I want to see the differential expression of EGFR in No tumor group vs with tumor group. Can I make an average of "raw_count" and compare the two groups? Or should I do some sort of a transformation? Or scaled_estimate multiplied by 10E6 is more accurate? The numbers of scaled_estimate is very very low like 2-10*10E-5, does it mean that the gene is not getting transcribed that much?

Sorry for me being naive in this field. But I thank any ideas and comments

Thanks

RNA-Seq R TCGA • 4.4k views
ADD COMMENT
2
Entering edit mode
9.5 years ago
roy.granit ▴ 890

You can read more about the TCGA data types here. But basically the raw_counts is the total number of counts for that gene, while the scaled_estimate is the relative fraction of reads for that gene. Notice that you also have the 'normalized_counts' data, which is the transformation of the raw data with the 75th percentile of that column.

I believe that most people take the normalized counts, log2 transform them, and then compare between samples. This way you actually internally normalize the data and can compare different samples without further normalization.

I would recommend two very useful tools that will save you much time handling the data without tedious spreadsheet work:

  1. https://genome-cancer.ucsc.edu/proj/site/hgHeatmap/ - cancer browser
  2. http://www.cbioportal.org/ - cBioPortal

Both tools allow you to analyze the TCGA data very easily.

ADD COMMENT

Login before adding your answer.

Traffic: 2364 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6