Hi, I’m going over some considerations for a project and would appreciate some advice/feedback.
I have two RNAseq datasets (disease 1 + control1 and disease2 + control2) that I would like to combine. These are all samples from our lab, which were prepared at the same time using the same protocol, were sequenced together (on the same flow cell) and processing and alignment was done using the same analysis pipeline, so that all parameters would be identical. We have separate control groups, since the wt littermates from each disease group were used, but they have the same background strain. When I say that they are two data sets it is since they were initially analysed and published for each disease separately, technically, all samples were treated identical.
We now want to combine the two sets. I did some initial exploratory analysis, looking at sample-to-sample correlation and PCA but I did not see any obvious batch effect and the two control groups cluster together quite well and are tightly correlated, as we expected. My question is: would it be ok to essentially treat these as one group? Is there anything I should consider when combining these datasets?
It should be fine. Since they were all collected, processed, and sequenced in a single batch I wouldn't expect any confounding batch effect. This appears to have been confirmed by your PCA too.