Hi,
I am new to the field of RNASeq and I have been reading a lot on different normalizations etc. on Biostars but I couldn't find an answer to the problem I am facing right now:
I have a data set from this paper (dx.plos.org/10.1371/journal.pone.0118528) and as far as I understand it, they used TMM normalization:
Samples were grouped and quantification of transcript abundance was performed on this final read list using Trimmed Means of M-values (TMM) as the normalization method [27]. Output data utilized for all subsequent comparisons was a normalized signal value generated by AvadisNGS.
(p. 5)
Now I want to compare this data with a dataset of TCGA which uses RSEM (https://wiki.nci.nih.gov/display/TCGA/RNASeq+Version+2).
I found some posts on similar questions (e.g. TMM normalisation from RSEM raw counts) but I still don't know how to proceed. Can anyone help me out?
Thank you so much!
Max
ok, it tried to do this in R as described in this post RNA-seq normalization: How to use TMM and rpkm() in EdgeR
I used the "mRNAseq_raw_counts" and "mRNAseq_median_length_normalized" data files from TCGA.
However, the resulting matrix contains a lot of NaNs and Inf values and has nothing to do with the original distribution. Any ideas what went wrong?
Double check how many of the _raw_counts and _length_normalized are NaN and 0s. Also, have a look at edgeR's user manual, it goes into a bit of detail into what edgeR is doing.
ok I just found out that these three files downloaded from TCGA have the exact same content:
LUSC.mRNAseq_median_length_normalized
LUSC.mRNAseq_raw_counts
LUSC.mRNAseq_RPKM
so there seems to be no length information provided...