I am currently using TCGA-Assembler to download TCGA data. I am interested in the RPKM and RSEM values for each gene. For RSEM i can easily obtain the values per gene (around 20k values per sample). For RPKM, on the other hand, I can only obtain values regarding each exon (around 230k values per sample), not gene. My questions are:
Does anyone know if TCGA provides RPKM values per gene?
If not, given that I can map each exon to a gene (TCGA has a mapping for that), is it easy (or even possible?) to obtain the RPKM values per gene, as for the RSEM?
I can find this for RNASeqV1 data, but not for RNASeqV2 data. Any ideas? Furthermore, only a couple of data have this available for RNASeqV1 data... Is it possible to obtain this for RNASeqV2 data?
I am not sure about the second part. RNAseqV2 are the results from using different processing pipeline. So I don't think you would get gene wise RPKM (instead you have rsem.genes.normalized_results). Maybe this link will be helpful to you.
I already saw that page, but thanks for citing it. I realize that they do not provide RPKM values per gene, but I guessed it would be "easy" to derive these values from other information available in the files... I guess it is not the case.
Download TCGA data in this website. Then download the file mRNAseq_Preprocess.Level_3.2014xxxx00.0.0.tar.gz. Unzip and you will find the files contain RPKM data.
I have been having the same issue, as I imagine many others have. I am working on colon cancer, and a recent nature genetics paper from Isella et al on subtyping/classification gives their full method for conversion and comparison of the RSEM and RPKM values available in RNAseqV1/V2. It did require comparison of samples found in both V1 and V2, so that might be a barrier if not available for you. If you work on CRC then you are in luck as they have made a Bioconductor package containing all samples with "converted" RPKMs. Otherwise you would most likely be doing your community a huge favour converting the data.
Thanks for the hint. But where did you see the conversion from RSEM to RPKM? I only found the part where they try to match the samples generated using the GA and HiSeq platforms but they are both already in RSEM (i.e. TCGA v2). Do you see what I mean?
Hi,
I can find this for RNASeqV1 data, but not for RNASeqV2 data. Any ideas? Furthermore, only a couple of data have this available for RNASeqV1 data... Is it possible to obtain this for RNASeqV2 data?
I am not sure about the second part. RNAseqV2 are the results from using different processing pipeline. So I don't think you would get gene wise RPKM (instead you have rsem.genes.normalized_results). Maybe this link will be helpful to you.
I already saw that page, but thanks for citing it. I realize that they do not provide RPKM values per gene, but I guessed it would be "easy" to derive these values from other information available in the files... I guess it is not the case.