TCGA: Does TCGA cancer studies have mRNA expression data for Control/Normal samples?
1
18
Entering edit mode
10.3 years ago
komal.rathi ★ 4.1k

Hi everyone,

I am using the TCGA portal to get mRNA expression data for various cancer studies (e.g. lung, liver, thyroid etc). We have been on a lookout for control/normal samples for the cancer studies on TCGA. On the website we could find case/tumor samples but couldn't find any control samples.

Does anyone know or have used control/normal samples from TCGA and can point me to it? Or do you know of a good resource (preferably using RNASeq V2 RSEM normalized expression values or z-scores) for control/normal samples in tissues like Lung, Liver, Thyroid etc. (basically all the fore-gut tissues)?

Thanks!

RSEM controls TCGA normals RNASeq • 33k views
ADD COMMENT
4
Entering edit mode

You can use TCGA-Assembler for that. There is a Nature Methods paper "describing it" (see ref on the link).

When you download the data using the DownloadRNASeqData function, you can specify if you want normal, primary tumor, recurrent tumor or metastatic. this will have you download RNASeqV1 or V2 level 3 data (RSEM normalized (or not)). you will have to transform it in z-scores yourself though.

You can do it by following this thread in Google groups by matching the sample names (for matched samples) or taking the average of normal controls for the non matched data

ADD REPLY
0
Entering edit mode

Thanks, what russhh said worked for me, but I will definitely give this a try. Looks promising!

ADD REPLY
0
Entering edit mode

TCGA-Assembler out of service, any good alternative?

ADD REPLY
1
Entering edit mode
ADD REPLY
2
Entering edit mode

Hi,

Since TCGA data are now on NCI website how can I download gene expression data (FPKM) for breast cancer and associated normal tissue. I do not find any "normal tissue" option (maybe I missed it..)

For example here's the selection for breast cancer expression data.

ADD REPLY
0
Entering edit mode

Since this is a separate query, you might consider starting a new question

ADD REPLY
1
Entering edit mode

There's certainly RNASeq data from matched normal samples (ie, normal lung tissue from a lung cancer patient) for the lung samples, eg TCGA-44-2655-11 here.

ADD REPLY
0
Entering edit mode

So, there are a lot of TN (Tumor samples that have matched normals) compared to NT ( Normal samples that have matched tumors). How is this possible? Shouldn't the number of TN be same as NT?

ADD REPLY
0
Entering edit mode

I don't know what you mean, that's certainly not what I thought I'd said - apologies.

There are very few control samples (ie, normal lung tissue from individuals who do not have cancer), but for around 20-25% of the lung tumour samples, there is an associated matched-normal lung sample

Hence, there are more tumour samples for which there isn't a matched-normal sample than there is tumour samples for which there is a matched normal sample

ADD REPLY
0
Entering edit mode

I meant, I referred to this & this, sample names ending in 01 are Tumor and those ending in 11 are Normal. When I went to the data matrix on TCGA for LUAD, there are options like Tumor-matched & Normal-matched. Also, according to this

  • TN (Tumor, matched normal) - Data for a tumor tissue for which matched normal tissue exists.
  • NT (Normal, matched tumor) - Data for normal tissue for which matched tumor tissue exists.

So I am a bit confused that shouldn't there be equal number of TN & NT when you check the data matrix?

ADD REPLY
1
Entering edit mode

Hi, komal.rathi , if I want analysis the TCGA data talked above for a differential expression test(for paired data), whether the quantity of TN set is too small compared with the NT set for a certain cancer type? Which might lead a deviation to the result.

Maybe it would be better, if I using the RNASeq data for the normal sample(without any cancer) as the control set for the differential analysis compared with a certain cancer? Will you give me a light where could I get the RNASeq dataset compared with TCGA?

Thanks!

ADD REPLY
0
Entering edit mode

@ komal.rathi

I need to download the RNA-Seq data, only (raw read counts for gene quantification) for Ovarian cancer patients from TCGA. I am not interested in downloading all the cases present in TCGA. I want a considerable number of patients with tumor and its match normal for which I can retrieve the RNA-Seq raw counts . I am bit confused as to what criteria of selection should I do? I have download the 489 cases of OvaCa data from TCGA having the gene expression values but there is no mention of which are for normal and which are for tumor. Can you let me know how I should do it from the portal? Correct me if ma wrong, I should first select TN RNA-Seq data for OV (color code blue), this is will give batch wise RNA-Seq V1 for tumor tissues. Now I should do the NT for finding the expression data of the samples samples of the normal for which I downloaded tumor data right? please share your idea.

ADD REPLY
1
Entering edit mode

ivivek_ngs

I am assuming you have the barcodes, e.g. TCGA-09-0364-01, for each of your samples. This is the code table I referred to. The last two digits tell you if it is a tumor or normal sample. I used the TCGA Assembler to first download everything and then extracting out the matched Tumor and Normal samples. When you download from the data matrix, blue is for Matched Tumor sample and yellow is for Matched Normal sample.

But I just checked, there is no matched normal sample available for download for Ovarian serous cystadenocarcinoma in TCGA. I went to the data matrix portal, selected RNASeq and RNASeqV2 in Data Type, Level 3 in Data Level, and Tumor - matched & Normal - matched in Tumor/Normal section. It returned only Matched Tumor samples but no matched Normal samples. I guess they are not available for download yet.

ADD REPLY
0
Entering edit mode

@ komal.rathi

Yes I could not find the matched normal samples as well for both RNASeq and RnASeqV2 in the data type for Level 3. It also returned only blue codes which is for matched tumor samples. So I guess it would be not possible for me to get a few patient cohort that might give me matched tumor and normal RNA-Seq data. Will it be helpful to download the clinical data from any other repositories?? Any inputs on that? I have asked a question in another link, if you would like to answer.

ADD REPLY
0
Entering edit mode

ivivek_ngs I am not aware of any other repository but I will try to find it.

ADD REPLY
0
Entering edit mode

Oh, alright! Thanks!

ADD REPLY
0
Entering edit mode

Download-->TCGA-Assembler software

Download-->TCGA-Assembler Manual: http://www.compgenome.org/TCGA-Assembler/documents/TCGA-Assembler%20User%20Manual.pdf

Refer to section--> "ExtractTissueSpecificSamples" on page 27.

ADD REPLY
2
Entering edit mode
8.3 years ago
JJ ▴ 710

Hi,

Download the clinical files e.g, here: http://firebrowse.org

If you then look at one of the merged_only_clinical file e.g., KIRC.merged_only_clinical_clin_format.txt, then look at the barcodes: https://wiki.nci.nih.gov/display/TCGA/TCGA+barcode

The two digits at position 14-15 of the barcode indicates the sample type.

Tumor types range from 01 - 09, normal types from 10 - 19 and control samples from 20 - 29

So 0 are tumors and 1 are normals e.g, 01 are primary tumours

Some datasets will contain normals, some only cancer samples.

EDIT: RNASeq V2 RSEM normalized expression values are available over http://firebrowse.org as well.

Best,
Julia

ADD COMMENT
0
Entering edit mode

ok thanks. They should add this option in their search tool... It's a little bit a pain in the a#* ;)

ADD REPLY
0
Entering edit mode

For filenames that don't have position 14-15, is position 6-7 equivalent?

e.g.

TCGA-08-0531 -> Tumor ;
TCGA-12-0615 -> Control ;
TCGA-26-1438 -> Normal ;

Thanks for the link to firebrowse Julia. Great resource!

ADD REPLY
0
Entering edit mode

nope, that is not the same

ADD REPLY
0
Entering edit mode

Hi Julia,

As bann13 pointed, I don't see the format that you mentioned in (KIRC.merged_only_clinical_clin_format.txt) file, instead I saw "tcga-3z-a93z" - missing the 14-15 position. I am looking for Lung cancer(LUAD) Normal and cancer patient gene expression data. I have also checked LUAD file and I found the same format tcga-05-4244.

Help will be appreciated.

ADD REPLY
0
Entering edit mode

in the clinical data you won't have data (mostly) about normal or tumor, i.e. 14-15 position simply because they come from the same patient and therefore they won't add duplicate information.

ADD REPLY

Login before adding your answer.

Traffic: 1835 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6