Hi!
I am trying to compare bulk-RNAseq data from Brain Metastases to the Primary. I performed DGEA using DESeq2 and shrunk the LFC using apeglm.
While investigating the results table, I observe genes such as GFAP (Glial fibrillary acidic protein) which are already unique to the brain/CNS having high LFC in brain compared to primary. I think that this observation might be due to some of the brain tissue samples had low tumor purity (i.e had more of the surrounding normal brain in it). I then used TidyEstimate to predict the stromal and immune infiltration score for reach of the samples to get an insight into the sample's purity.
My question is whether it's logical to now add these two scores as part of my design in DESEq2?
eg: DESeqDataSetFromTximport(genes_results, sampleTable, ~0+batch+immune+stromal+biopsy), where sampleTable has sample level information, genes_results is the gene level abundance estimates from RSEM that were imported using TxImport.
I am worried that since the scores were calculated using the gene expression data for each sample, using the scores in deseq2 might result in non-meaningful output as the scores are linear combinations of the gene expressions.
Thank you in advance for your feedback.
Best, Abhishek