I would like to be able to pick cancer cell lines that have "high", "medium" and "low" expressions of certain genes based on what the high, medium, and low expressions of those genes are in cancer patient populations. Would need to know the distribution of expressions in patients before determining what my high, medium, and low cutoffs will be.
I've gathered RNAseq data on cell lines from DepMap.org and I also have RNAseq data on The Cancer Genome Atlas (TCGA) patient populations from cBioPortal, the problem is DepMap.org's RNAseq units are log2(tpm+1)
and TCGA RNAseq units are RSEM. Is there a way to be able to compare those numbers (convert one unit into the other or another source with same units??)
Description of one of my TCGA files (for adenoid cystic carcinoma - ACC) called meta_RNA_Seq_v2_expression_median.txt
:
cancer_study_identifier: acc_tcga
genetic_alteration_type: MRNA_EXPRESSION
datatype: CONTINUOUS
data_filename: data_RNA_Seq_v2_expression_median.txt
stable_id: rna_seq_v2_mrna
show_profile_in_analysis_tab: false
profile_description: Expression levels for 20532 genes in 79 acc cases (RNA Seq V2 RSEM)
profile_name: mRNA expression (RNA Seq V2 RSEM)