I combined multiple datasets into one. The datasets are bulk RNA-seq data regarding samples of primary cancer vs metastatic cancer. Now I have all the counts in one dataframe, and all the metadata in one dataframe also. I want to run a DESeq2 analysis of the two groups, and of course I want to do design = condition
, because I want the results to be only according to the cancer condition if it's primary or metastatic. The probelem is I am getting reults that are being affected by the datasets. When doing PCA for example, each dataset clusters alone, which is not right. I have 7 datasets overall and I dont want the source (the dataset) to affect the resuls.
Should I adjust the design in DESeq ? should I use RUVseq ? I'm a bit lost
So this can't be done? Can't limma handle this kind of problem?
You have 7 datasets, and each is from a different study?
Yes.. can't I add the dataset number to the DESeq design or something ?
Yes, but that only works if batch is not confounded with condition.