Dear all,
I recently conducted an experiment and there seems to be quite some variation within replicate groups. The experiment was done to high standard and the extraction of RNA was done all the same way on the same day to eliminate the possibility of batch effects. I used 3 replicates ( more reps prob would have been best in hindsight!).Samples were from wild type organisms from the field, which were acclimated for several weeks in the same conditions. I know gene expression is stochastic, but it seems this data set is rather tricky.
Has anyone got any experience of working with variability and dispersion within replicate groups and can point me in the right way of a few methods? I've ran EdgeR and the model does not really fit well. I am running a single factor experiment.. I get D.E, but it is much less than what I thought,especially for the magnitude of difference between treatments.
Any ideas or ways I can go about analysis? Outliers I guess are somewhat tricky to call when you have a low amount of reps; however, there is 1 outlier in one replciate group by the looks of it. Is it normal to remove outliers in RNAseq?
Did you make MDS or PCA plots? Do you see that the groups are clustered together or not?
Removing outliers is in my opinion not really representative for natural field experiments, is it? I mean all samples are from the field right, this is probably real biological variation (and not caused by technical issues)?
Are you familiar with the work of Crawford and Whitehead on killifish? They used microarrays to investigate variation in the field.
I would say loosely clustered, 1 group not so much, with an overlap with another group. My initial thoughts were there is no logic or reason to exclude a replicate as with n=3, how do you define a true 'outlier'. I would assume it's biological variation rather than technical as the exp. and extraction protocol and lane sequencing was done to high standards. The dispersion is quite spread in some groups. Just trying to think of a suitable method to analyze the data. Do any softwares provide some sort of shrinkage dispersion re-modelling that is viable?
you can have a look at the sva package, to correct for unknown sources of variation