Dear all Hi !
First of all, I apologise about the title of the question,I tried to explain it as easy as possible.
I have a CRISPR screen data. In this analysis, we knock down (deactivate) genes to see how does that gene deactivation affects the cell behaviour (increase in number of cell or vice versa). We use small guide RNAs to target these genes. So in my data, each gene has 5 sgRNA. As we all know, crispr has lots off target effects which creates variability so sgRNAs might not behave consistently.
I share you 2 of our 250 genes in the data frame. 1/0 indicates whether their fold change is greater than 2 or not. Since our aim is to pick the genes which block proliferation, I would like to get sgRNA which have 1s consistently.
Previously, I summed up the row scores and then take their mean in order to get a expected gene score. However, I observed that even though the score of the following two genes are same, I would like to favor the bottom ZMYDN8 one, because the consistency of a guide is more important. Therefore (1 ,1 ,1) should be scored more valuable than any (1,0,1) or (0,1,1) etc.
Forgive my ignorance but, is there a known statistical method which will give me what I ask ?
My aim is to weight these sgRNA inversely to their probabilities so that a rare event will dominate other common events.
Thank you for your help, (sorry for my bad england) Best,
Tunc.
Ps: please dont hesitate to suggest any better title than the current one if it will make it easier to understand.
sgRNA set5 set6 set7
ACAT1_AT_1 1 1 0
ACAT1_AT_2 1 0 1
ACAT1_AT_3 1 0 0
ACAT1_AT_4 0 0 0
ACAT1_AT_5 0 1 0
sgRNA set5 set6 set7
ZMYND8_BROMO_1 0 1 0
ZMYND8_BROMO_2 1 1 1
ZMYND8_BROMO_3 0 0 0
ZMYND8_BROMO_4 1 1 1
ZMYND8_BROMO_5 0 0 0