I would need to retrieve the normalisation (and maybe the batch correction method) used to produced the pan-Cancer Atlas mRNA expression matrix (file called 'EBPlusPlusAdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.tsv' found here).
Starting from the raw read counts obtained from the GDC and the same gene panel, I tried FPKM and FPKM-UQ normalisation as described here, but the expression values obtained do not fall at all in the same range as in the pan-Cancer mRNA matrix. Maybe that would suggest a cross-sample batch correction.
My goal is, starting from raw read counts, to normalise expression data from new samples together with the pan-Cancer mRNA data, in order to get a unified expression matrix and to be able to compare apples to apples basically.
Any information or alternative method would be greatly appreciated.
whether the exp matrix log transformed?