If I have an expression matrix with 2 conditions, A and B, and 30 cells in A condition, 40 cells in B condition. And I want to perform wilcoxon rank sum test for each gene in this matrix, Is it ok to perfor following pre-processing steps? 1. Cutoff setting: Only a gene expressed in at least 5 cells in either A or B should be considered for next steps, and if a gene expressed less than 5 cells in either conditions, set all the expression value in this condition to 0. 2. should I remove zeros from A and B and perform wilcoxon runk sum test for remaining non-zero values? After pro-processing in step 1, and if all the expression value of a gene in condition B is zero (not all zero in A), set 5 zeros for B and perform test with non-zero values in A.
Why are you using an inferior methodology rather than using one of the many established R packages?