TCGA exons data-de novo isoform assembly
0
0
Entering edit mode
8.4 years ago
juara ▴ 40

Hello,

I am very beginner in this world of RNA-seq and I am trying to do a de novo isoform analysis of a particular gene using TCGA. so far, I manage to download the exon_quantification data and using hg19 I figured the locus of each exon for this gene. So now I have the reads of each exon. My data looks like this:

raw_counts median_length_normalized RPKM

56 0.9254658 2.097887247

43 0.8739496 2.174684905

82 1 4.927216087

341 0.9669247 1.455338147

48 1 3.003161125

52 1 4.383085855

109 1 5.512573364

119 1 4.570871422

158 1 4.942702685

621 0.571371 0.759681418

And it is like 550 patients. I do not really know what I should do with these. Should I focus on raw_counts or RPKM? and How can I figure if this gene is spliced and if so where exactly and how to quantify? What does the median length tell me?

Thank you very much for your help.

RNA-Seq tcga RPKM transcript • 2.0k views
ADD COMMENT
1
Entering edit mode

Hi, Is your splice isoform of interest a known isoform? If yes, then you could as well look into the rsem.isoform.normalized files. The data appears like this -

isoform_id  normalized_count
uc011lsn.1  0.0000
uc010unu.1  2.5219
uc010uoa.1  0.0000

You can then check for your isoform of interest. If your isoform of interest is a novel one then I am not sure if you would have any use in looking into TCGA Level 3 data (which is what you are looking now). Some details about different files available are here.

You could try your luck though on the junction_quantification files. The data is like -

junction    raw_counts
chr1:12227:+,chr1:12595:+   0
chr1:14829:-,chr1:14970:-   135
chr1:14829:-,chr1:15796:-   0

I am not sure though if the parameters provided to RSEM (the prog. used) allowed spitting out novel junctions as well. If not then you might want to get access to BAM files from TCGA (through license).

If you are going to compare values across samples, then you should use RPKM rather than raw counts. Also, be aware that there might be batch effects operating across samples (due to samples processed on different dates) and hence you might want to do batch effect removal by yourself or use data from here.

ADD REPLY
0
Entering edit mode

Thank you very much for your help. I also downloaded the rsem.isoform.results file which gives me the isoform id and the raw_count and scaled_estimate. I guess I need to use the scaled_estimate to compare across samples?! And I did not get why I can not use the exon_quantification level 3 data for a novel isoform. according to your link this file has the counts mapped to a specific exon. So can't we just use those? Like in my example the raw counts for the exon 4 and 10 is a lot more, does that mean it is transcribed as another isoform?! Sorry it sounds very naive but I am completely lost I will download the junction quant file too and see if I can find a way to use that. Thanks again for your help

ADD REPLY
1
Entering edit mode

The reason I said that the Level 3 data might not be useful is that some of the transcript assemblers I have used, like Cufflinks or StringTie, have an option of returning the expression levels of only the known transcripts isoforms. You can of course turn it off and ask it to give both: known & novel. In case of TCGA RNA-seq Level 3 data, RSEM has been used and I am unaware of how it works, i.e. if it was asked to return values of novel isoforms or not.

Again I can not comment anything helpful on how to interpret on the exon_quantification results as I haven't read much about the RSEM methodology. I have only used the gene level or isoform level data. What I can only think is does your isoform have a unique exon-exon junction? That which is not present in any other isoform from that genic locus? If yes then you can look for that junction in the junction_quant file. That would be the lowest hanging fruit, I guess you can aim for.

ADD REPLY
0
Entering edit mode

Thank you for your input. Let me try and see how it goes. I will update it here for the rest of the people

ADD REPLY

Login before adding your answer.

Traffic: 1782 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6