Hi everyone, Iam a bit puzzled which statistical test I should use for my data. I have a subset of genes (based on methylation status of the promoter) and the expression data (FPKM) of two lineage related cell types (FACS from one animal). Question is if there is a difference between the expression of celltyp A vs celltyp B if the promoter is annotated as methylated in A according to my dataset. Right now Iam using a Wilcoxon signed rank test with continuity correction (Wilcoxon because the data are not normaly distributed and paired because i analyse the same set of genes in celltyp A and celltyp B). However I notice there seem to be a bias while testing bigger subsets of genes vs smaller subsets of genes. Bigger subsets (lets say 500 genes) seem to become always significant although the differences (at least by eye) dont look "big". On the other hand smaller subset (like 50) are not significant although the boxplot looks much (!!!) more convincing as for the big subset.
Is the wilcoxon rank test correct or is there another test which takes also into account how many genes are tested ?
Thanks a lot, Flo
Thanks for the fast response. I have to say the data i get from the above mention approach make biological sense. Gene where i expect a change (because they are methylated for example) show a change whereas other genesets (for example 500 random genes) show nothing. The problems comes when I subgroup the regions for example only genes where methylation is gained at predicted enhancers (which narrows done my geneset from 500 to lets say 50). The boxplot looks impressive and the changes are still ridiculous significant. Since we test with ten times less genes and still reach the same ridiculous low p-value i would guess methylation changes at enhancer cause a much higher and robust change then if i take all higher methylated regions (which makes biological sense). But how can I put this in numbers ? Is there no way ?