I need an advice regarding the analysis of multiple CRISPR data sets, which we are working on.
We have multiple CRISPR data sets spanning over almost a year now and I would like to combine them all together into one big data set, therefore increasing the statistical power. Many of them can be regarded as biological replicates, others just as different conditions.
we will definitely need to frist filter out some samples, and there will be some batch effects.
I would like to know if anyone has experience with this kind of analysis?
What would be the best way to calculate the correlation of the different samples?
I know that mageckFLUTE
has a BatchRemove
command. Has anyone worked with it already? Are there other ways to do this correction?
thanks for the help
EDIT:
Sorry for the misunderstanding, looking at the whole data, we have replicates for each of the time-points/ conditions, though in varying numbers. I don't think it is asymmetrical, as all in all we will have similar number of samples per TP, if we can correlate them and remove batch effects from the samples.
The data we have are sequencing libraries we quantified and normailized using mageck
and/or screenProcessing
to get count matrices.
Please provide more details. The structure of your dataset is unknown, and currently sounds like it may be asymmetrical if only some conditions have replicates. Also, what is your readout? Is this sequencing data, or measurements of a continuous or discrete variable?
I just modified my response. sorry for the missing information