Hello everyone,
Is there anyone who could shortly enlighten me on how TCGA generates their isoform expression data briefly ? You should feel free to leave any informative links as well in the comments section. Thank you i n advance!
Hello everyone,
Is there anyone who could shortly enlighten me on how TCGA generates their isoform expression data briefly ? You should feel free to leave any informative links as well in the comments section. Thank you i n advance!
This is not easy information to find! The best explaination I've found is here: https://github.com/zyxue/MapspliceRSEM-clean-doc Which should be read in conjunction with this: https://bsbludwig.com/post/94066296740/what-do-tcgas-rnaseq-files-actually-show#fn:1
In short, the Broad firehose data is calcualted by RSEM (https://github.com/deweylab/RSEM). You get two columns per sample, raw_counts
and scaled_estimate
. THese are not the same names as actaully output by RSEM. raw_counts
apparently corresponds to estimated_counts
in the RSEM output and scaled_estimate is related to TPM: TPM is this value multiplied by 1e6.
The normalized files are these values multiplied by the 75 centile and divided by 300 (no idea why 300).
I actually raised a related issue previously: What are RSEM normalized values?
It's reassuring to see that this is indeed not as simple to decipher as it may seem.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Which collection of TCGA data are you interested in? The data on https://portal.gdc.cancer.gov/? The data on the Broad institute's firehose?
Some other source?
I am interested in the isoform expression data on Firehose.
science03 : Please do not delete threads once they have received a comment/answer. If your question was solved then you should accept the answer provided below (green check mark).