I have raw counts and gene lengths. Any tool that can compute TMM normalised gene length corrected data? I want to compare expression of one gene to another gene. Standard TPM doesn't seem to take compositional bias into account?
Would something like this in DESeq2 work?
x <- raw_counts/gene_length
library(DESeq2)
d <- DESeqDataSetFromMatrix(countData=x,colData=m,design=~Sample)
d <- DESeq2::estimateSizeFactors(d,type="ratio")
y <- counts(d,normalized=TRUE)
And y
would be GeTMM.
I think so, although I would do the opposite since DEseq2 expects raw counts as input (first do the TMM normalization, then scale to gene length). Not sure whether it will change the results much, but at least it should prevent DESeq2 from throwing warnings.
I tried to do a reproducible example to go from raw counts to TMM normalized gene length corrected TPM counts.
Download sample data file GSE60450_Lactation-GenewiseCounts.txt from https://figshare.com/s/1d788fd384d33e913a2a.
Take count data to a new object. Add genes as rownames.
Create DESeq2 object.
Then the gene length file is prepared. I am really not sure about this part. DESeq2 needs an assay of gene lengths I think. So I prepared a dataframe equal in dimensions to the input count dataframe.
Add gene lengths to DESeq2 object.
Now, FPM and FPKM can be computed.
For TPM, calculate FPKM, then divide by sum of column and multiply by 10⁶.
And now we compute sum of columns for TPM.
Doesn't add up to 1M.
use
t
to transpose the matrix before dividing by the sum of columns and transpose it back:Yes! That works! Thanks!