Hello, I am trying to follow preprocessing steps explained in this publication (Individualized multi-omic pathway deviation scores using multiple factor analysis). As explained in their supplementary metarial, the authors followed below steps:
Normalized miRNA abundance was quantified as Reads per million microRNA mapped (RPMMM) values. RNAseq and miRNA-seq quantifications were TMM-normalized (Robinson and Oshlack, 2010), converted to counts per million (CPM), and log2-transformed.
As I have understood the following steps are done:
- Quantification of miRNA abundance as RPMMM values -> this is already done when downloading miRNA data from TCGA
- TMM normalization for RNA-seq and miRNA-seq data
- Conversion to counts per million (CPM)
- Log2 transformation
I have provided the below R code. However since this is my first experience working with miRNA data, I am not sure if everything is correctly implemented.
# Calculate TPM for RNA-seq data having a vector of gene lengths
x <- RNA_counts/geneLength
norm_RNA_counts <- t(t(x) * 1e6 / colSums(x))
# Calculate TPM for miRNA-seq data
library_sizes_miRNA <- colSums(miRNA_counts)
scaling_factors_miRNA <- median(library_sizes_miRNA) / library_sizes_miRNA
norm_miRNA_counts <- t(t(miRNA_counts) * scaling_factors_miRNA)
#Calculate CPM for RNA-seq
total_mapped_reads <- sum(norm_RNA_counts)
cpm_RNA <- norm_RNA_counts / total_mapped_reads * 1e6
#Calculate CPM for miRNA
total_mapped_reads <- sum(norm_miRNA_counts)
cpm_miRNA <- norm_miRNA_counts / total_mapped_reads * 1e6
#Log2 transformation
log2_cpm_RNA <- log2(cpm_RNA + 1)
log2_cpm_miRNA <- log2(cpm_miRNA + 1)
I have looked into many posts and got the TPM code for RNA-seq data. However for miRNA I could not find any specific one. I would appreciate any comment on the code if it has any issue.
Yes, I found some similar ideas on biostar repository. So you think the authors have done something wrong? And I should only consider CPM?
When everything is the same size, correcting for size is pointless. It doesn't change the numbers much at all.