I want to check a hypothesis that two features of bacteria correlate with each other. I have a big set of bacteria for which I know both of these features. However, different taxonomic groups are presented unequally in my set; it means that any simple statistical test would be biased. Is there any effective way to avoid the phylogenetic bias?
Dear Abascal, thank you very much for your answer! BayesTraits looks like a very good idea but it analyses ancestral states so it seems to be not applicable to analyze evolution of bacteria just because of big evolutionary distances between them. What do you mean saying "analyse the correlation within a phylogenetic context"?
The idea of manually removing of some genomes is good, but there is one problem: it is unclear what the level of taxonomy should I consider. For example, if I choose only 1 genome per order, my set still could contain 7 orders belonging to Proteobacteria philum and 13 orders of Firmicutes, so at the phylum level my set will be biased too... So my set will be biased in the different ways depending of taxonomic level I consider...
BayesTraits should work fine with your data, I don't think large evolutionary distances are an issue here. "Analyse the correlation within a phylogenetic context" means that; your data are not independent samples, they are already "correlated" by common descend, that's why you have to test correlated evolution using a phylogenetic tree. You can read this: https://en.wikipedia.org/wiki/Phylogenetic_comparative_methods