I'm trying some metrics to filter a list of gene names with associated numerical values which correspond to their abundance.
I'm looking for some interesting genes and trying to see if they come up to the top region of the ranked gene list(ranked by abundance) before and after filtering. Please see Fig. enclosed
I wish to see if the ranking of my gene of interest (G) has changed significantly before and after filtering and I'm thinking to do a Fisher's Exact test for proportions to ascertain the significance.
So given the rank of the gene of interest I'm willing to look at the number of genes above and below the gene in rank with and without the filtering and compare their proportions using FE Test
If in the UN-Filtered data there are 200 and 2800 genes above and below G and 30 and 100 in the Filtered data respectively then I can do the FE test as follows (in R):
>my.mat <- matrix(c(30,100,200,2800),nrow=2,byrow=TRUE)
>my.mat
[,1] [,2]
[1,] 30 100
[2,] 200 2800
>fisher.test(my.mat,alternative="two.sided")
Please let me know whether I can do a Fisher Exact test AT ALL for such kind of measurement ?
Thanks
I would suggest a permutation test. permute randomly your lists many times and see how frequently you get a change as big as the real one.
In general I don't see major flaws with that. However, Fisher's assumptions are that all observations are independent of each other which in fact might not be the case w.r.t the gene you are looking at.
Many thanks Phil and Martomobo. Phil can you please explain how Fisher's assumption of independence can be violated here?
Imagine you are investigating a gene which codes for a "master" regulator somewhere upstream in a pathway, or even regulates a whole pathway. Therefore, your gene of interest will influence the level of abundance in a significant manner. Saying that, it also influences the position of a particular gene /or some genes in your ranking based on abundances. I know it is a special case but you have to, at least, keep in mind that some of the genes are not independent from each other. I guess, if you geneset is large enough you can ignore this fact to some extend that is why i would give it a try. Imho, it would be nice to see whether the permutation test gives comparable results to "fisher's" approach.