Hi everyone,
I am using the TCGA portal to get mRNA expression data for various cancer studies (e.g. lung, liver, thyroid etc). We have been on a lookout for control/normal samples for the cancer studies on TCGA. On the website we could find case/tumor samples but couldn't find any control samples.
Does anyone know or have used control/normal samples from TCGA and can point me to it? Or do you know of a good resource (preferably using RNASeq V2 RSEM normalized expression values or z-scores) for control/normal samples in tissues like Lung, Liver, Thyroid etc. (basically all the fore-gut tissues)?
Thanks!
You can use TCGA-Assembler for that. There is a Nature Methods paper "describing it" (see ref on the link).
When you download the data using the
DownloadRNASeqData
function, you can specify if you want normal, primary tumor, recurrent tumor or metastatic. this will have you download RNASeqV1 or V2 level 3 data (RSEM normalized (or not)). you will have to transform it in z-scores yourself though.You can do it by following this thread in Google groups by matching the sample names (for matched samples) or taking the average of normal controls for the non matched data
Thanks, what russhh said worked for me, but I will definitely give this a try. Looks promising!
TCGA-Assembler out of service, any good alternative?
TCGA Firehose
Hi,
Since TCGA data are now on NCI website how can I download gene expression data (FPKM) for breast cancer and associated normal tissue. I do not find any "normal tissue" option (maybe I missed it..)
For example here's the selection for breast cancer expression data.
Since this is a separate query, you might consider starting a new question
There's certainly RNASeq data from matched normal samples (ie, normal lung tissue from a lung cancer patient) for the lung samples, eg TCGA-44-2655-11 here.
So, there are a lot of TN (Tumor samples that have matched normals) compared to NT ( Normal samples that have matched tumors). How is this possible? Shouldn't the number of TN be same as NT?
I don't know what you mean, that's certainly not what I thought I'd said - apologies.
There are very few control samples (ie, normal lung tissue from individuals who do not have cancer), but for around 20-25% of the lung tumour samples, there is an associated matched-normal lung sample
Hence, there are more tumour samples for which there isn't a matched-normal sample than there is tumour samples for which there is a matched normal sample
I meant, I referred to this & this, sample names ending in 01 are Tumor and those ending in 11 are Normal. When I went to the data matrix on TCGA for LUAD, there are options like Tumor-matched & Normal-matched. Also, according to this
So I am a bit confused that shouldn't there be equal number of TN & NT when you check the data matrix?
Hi, komal.rathi , if I want analysis the TCGA data talked above for a differential expression test(for paired data), whether the quantity of TN set is too small compared with the NT set for a certain cancer type? Which might lead a deviation to the result.
Maybe it would be better, if I using the RNASeq data for the normal sample(without any cancer) as the control set for the differential analysis compared with a certain cancer? Will you give me a light where could I get the RNASeq dataset compared with TCGA?
Thanks!
@ komal.rathi
I need to download the RNA-Seq data, only (raw read counts for gene quantification) for Ovarian cancer patients from TCGA. I am not interested in downloading all the cases present in TCGA. I want a considerable number of patients with tumor and its match normal for which I can retrieve the RNA-Seq raw counts . I am bit confused as to what criteria of selection should I do? I have download the 489 cases of OvaCa data from TCGA having the gene expression values but there is no mention of which are for normal and which are for tumor. Can you let me know how I should do it from the portal? Correct me if ma wrong, I should first select TN RNA-Seq data for OV (color code blue), this is will give batch wise RNA-Seq V1 for tumor tissues. Now I should do the NT for finding the expression data of the samples samples of the normal for which I downloaded tumor data right? please share your idea.
ivivek_ngs
I am assuming you have the barcodes, e.g.
TCGA-09-0364-01
, for each of your samples. This is the code table I referred to. The last two digits tell you if it is a tumor or normal sample. I used the TCGA Assembler to first download everything and then extracting out the matched Tumor and Normal samples. When you download from the data matrix, blue is for Matched Tumor sample and yellow is for Matched Normal sample.But I just checked, there is no matched normal sample available for download for Ovarian serous cystadenocarcinoma in TCGA. I went to the data matrix portal, selected RNASeq and RNASeqV2 in Data Type, Level 3 in Data Level, and Tumor - matched & Normal - matched in Tumor/Normal section. It returned only Matched Tumor samples but no matched Normal samples. I guess they are not available for download yet.
@ komal.rathi
Yes I could not find the matched normal samples as well for both RNASeq and RnASeqV2 in the data type for Level 3. It also returned only blue codes which is for matched tumor samples. So I guess it would be not possible for me to get a few patient cohort that might give me matched tumor and normal RNA-Seq data. Will it be helpful to download the clinical data from any other repositories?? Any inputs on that? I have asked a question in another link, if you would like to answer.
ivivek_ngs I am not aware of any other repository but I will try to find it.
Oh, alright! Thanks!
Download-->TCGA-Assembler software
Download-->TCGA-Assembler Manual: http://www.compgenome.org/TCGA-Assembler/documents/TCGA-Assembler%20User%20Manual.pdf
Refer to section--> "ExtractTissueSpecificSamples" on page 27.