Hi all,
This is a follow up to my question a few days ago. The heatmap below shows clustering of samples that is not due to sex (all male), age (all same age), genotype (1/2) or condition of interest (High/Low). I have no way of knowing if this clustering is due to biological or technical variation. There is no more metadata available including dates of sample collection/processing. All samples were processed by the same person in the same lab spread out over a few months and I do not have the processing dates. All were sequenced on the same date, same machine, and same lanes.
Should I run SVA on this data considering there may be no hidden batch affect here, I could just be removing biological signal?
I could plot a pca and get the genes driving the difference between the two main unknown groups shown in the heatmap, then look at gene enrichment and networking but would there be any point? That's a lot of work unrelated to the focus of the analysis and I will still arrive back at the same question.... should I remove the effect or not?
Thanks in advance,
Kenneth