I have multiple datasets with bulk RNA-seq and the metadata for response / no response to immune checkpoint blockade therapy.
I want to use Tumor Immune Dysfunction and Exclusion, TIDE, in order to use the transcriptomic data, to predict response.
In order for TIDE to work, the data must be normalized. My data is already TPM normalized. I tried running that but it needs some further scaling. According to the tutorial video I need to do log2(TPM + 1), and then I need to "subtract average value across all patients".
Basically this is what I did in R:
tpm_log = log2(tpm + 1)
mean_across_samples = rowMeans(tpm_log)
tpm_final = sweep(tpm_log, 1, mean_across_samples)
The problem is, many of the prediction made by TIDE are wrong. Is the normalization step ok?
Additionally, I am facing another issue. I am uncertain about how to execute the algorithm because my data comprises four distinct types of cancer. Furthermore, some samples have undergone prior immunotherapy treatments, while others have not. Should I process each data set independently and subsequently consolidate the results into a single data frame? From a statistical standpoint, what would be the most appropriate approach?
Thank you
Hi I would like to do the same thing, have you realized meanwhile if your normalization is correct? Thank you very much in advance!
Hi
Unfortunately, the data and code are lost since as you see this was done more than a year ago and I finished my Masters since then. It worked for me after all, I do not recall how I normalized it exactly but it did work eventually. I will update this comment in case I was able to retain what I did.