Currently I have the allele-specific expression data on 12 different cancer types from TCGA.
I've organized them into matrices that consist of log2(alleleA_readcounts/alleleB_readcounts) for each gene for each patient I've also converted this into a Z-score matrix (Z-score across each gene,) but I'm not sure if this is necessary.
There is also a bunch of other data available such as a matrix of p_values of the ASE, clinical data ect.
I'm now trying to characterize/cluster the cancer types based on this data but I'm unsure how I'll do this. I'm in the process of developing an algorithm but this may take a while.
Does anyone have any insight or ideas as to where I could go next?