Hi
Is it possible to retrieve TPM from raw counts or DESeq2 object dds?
I have downloaded the data from TCGA using ""TCGAbiolinks" package in R, although I have download the whole data set, I am making it simpler here, to download for a few samples, so that it will be easier for you to see this data and get a grip:
library(TCGAbiolinks)
listSamples <- c("TCGA-E9-A1NG-11A-52R-A14M-07","TCGA-BH-A1FC-11A-32R-A13Q-07", "TCGA-A7-A13G-11A-51R-A13Q-07","TCGA-BH-A0DK-11A-13R-A089-07", "TCGA-E9-A1RH-11A-34R-A169-07","TCGA-BH-A0AU-01A-11R-A12P-07", "TCGA-C8-A1HJ-01A-11R-A13Q-07","TCGA-A7-A13D-01A-13R-A12P-07", "TCGA-A2-A0CV-01A-31R-A115-07","TCGA-AQ-A0Y5-01A-11R-A14M-07")
query <- GDCquery(project = "TCGA-BRCA", data.category = "Gene expression", data.type = "Gene expression quantification", experimental.strategy = "RNA-Seq", platform = "Illumina HiSeq", file.type = "results", barcode = listSamples, legacy = TRUE)
GDCdownload(query = query, directory = 'BRCA_test', method = 'api')
If you will run these commands, you will see the files are created in dir "BRCA_test", each folder contains files with extension *.rsem.genes.results
The file contains gene_id, raw_counts, scaled_estimates, transcript_id.
I am also able to get a dds object from DESeq2 package for this data using:
dds <- DESeqDataSetFromMatrix(countData = BRCA_mat, colData = sampleData, design = ~ condition)
I used "raw_counts" to generate "BRCA_mat"
Now my question is from raw count matrix, can I get TPM matrix?
I am aware that I will need featureLength, meanFragmentLength to calculate TPM - but given the data I get from TCGA, I do not have this data on length.
So is it possible to get TPM or even FPKM matrix from raw count martix?
Even if I will get FPKM, I will convert it to TPM.
And even if i do this:
dds <- estimateSizeFactors(dds)
counts(dds, normalized=TRUE)
Will these normalized counts be equivalent to TPM?
I think not, This is just dividing each column of
counts(dds) by sizeFactors(dds)
but it will not normalize for gene length,
so I want TPM matrix from raw counts matrix. Please help. Thanks in advance.