I am looking at the patterns of distribution of a particular pair of genes in bacteria. One of the characteristics that I am studying is genome size. What I found is that organisms having this gene pair tend to have significantly (Wilcoxon rank sum test) higher genome sizes as compared to organisms that completely lack this gene pair.
A direct question that arises with this is that the probability of getting a gene pair increases as you increase the genome size. Then what we observe in the previous paragraph could be just an artifact of this probabilistic nature.
Could someone please direct me to any papers, where groups have tried to answer this situation, say through any statistical randomization protocols?