Dear BioStars,
One of the methods for data quality assessment suggested in DESeq2 is to apply sample clustering. I have an RNASeq experiment with 8 treated and 6 untreated samples. All samples have good data quality metrics according to FastQC. But when I plot the heatmap of rlog
transformed data, below is what I get. Unlike what I would expect, samples are not clustered by treatment quite nicely. However, they don't look too bad either!
I wanted to ask your expert advise. What would you do in this case? Will you exclude some samples from DE analysis, or include all but try to apply a method to eliminate batch effects, such as svaseq
?
Thanks for your help in advance!
Noushin
Throwing is an option if you have one or two outliers but in your case you see several of samples from one group correlating well with samples from the other group. This could be a possible batch effect. We see it all the time. It could be mislabeling of samples too but I don't think that's a case here because normally people are careful with that :-). Improper multiplexing, low read depth, differences in platforms or library prep methods could be behind this. You can try "svaseq" or sometimes it is easier to just include covariates in your DESeq2 or edgeR analysis.